Re: I want "john smi" to find "john smith" in my custom "fullname_s" field

2017-06-06 Thread Amrit Sarkar
Nick,

"string" is a primitive data-type and the entire value of a field is
indexed as single token. The regex matching happens against the tokens for
text fields and against the full content for string fields. So once a piece
of text is tokenized, there is no way to perform a regex query across word
boundaries.

fullname_s:john smi* is working for me.

{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":16,
"params":{
  "q":"fullname_s:john smi*",
  "indent":"on",
  "wt":"json"}},
  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
  {
"id":"1",
"fullname_s":"john smith",
"_version_":1569446064473243648}]
  }}

I am on Solr 6.5.0. What version you are on?


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Jun 6, 2017 at 1:30 PM, Nick Way 
wrote:

> Hi - I have a Solr collection with a custom field "fullname_s" (a string).
>
> I want "john smi" to find "john smith" (I lower-cased the names upon
> indexing them)
>
> I have tried
>
> fullname_s:"john smi*"
> fullname_s:john smi*
> fullname_s:"john smi?"
> fullname_s:john smi?
>
>
> but nothing gives the expected result - am I missing something? I spent
> hours on this one point yesterday so if anyone can please point me in the
> right direction I'd be really grateful.
>
> I'm using Solr with Adobe Coldfusion by the way but I think the principles
> are the same.
>
> Thank you!
>
> Nick
>


Re: I want "john smi" to find "john smith" in my custom "fullname_s" field

2017-06-06 Thread Amrit Sarkar
Erik,

Thank you for correcting. Things I miss out on daily bases: _text_ :)

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Jun 6, 2017 at 5:12 PM, Nick Way 
wrote:

> Fantastic thank you so much; I now have 'fullname_s:#string.
> spacesescaped#*
> or email_s:#string.spacesescaped#*' which is working like a dream - thank
> you so much - really appreciate your help.
>
> Thank you also Amrit.
>
> Nick
>
> On 6 June 2017 at 10:40, Erik Hatcher  wrote:
>
> > Nick - try escaping the space, so that your query is q=fullname_s:john\
> > smi*
> >
> > However, whitespace and escaping is problematic.  There is a handy prefix
> > query parser, so this would work on a string field with spaces:
> >
> > q={!prefix f=fullname_s}john smi
> >
> > note no trailing asterisk on that one.   Even better, IMO, is to separate
> > the query string from the query parser:
> >
> > q={!prefix f=fullname_s v=$qq}&qq=john smi
> >
> > Erik
> >
> > 
> >
> > Amrit - the issue with your example below is that q=fullname_s:john smi*
> > parses “john” against fullname_s and “smi” as a prefix query against the
> > default field, not likely fullname_s.   Check your parsed query to see
> > exactly how it parsed.It works for you because… magic!   (copyField *
> > => _text_)
> >
> >
> >
> >
> > > On Jun 6, 2017, at 5:14 AM, Amrit Sarkar 
> wrote:
> > >
> > > Nick,
> > >
> > > "string" is a primitive data-type and the entire value of a field is
> > > indexed as single token. The regex matching happens against the tokens
> > for
> > > text fields and against the full content for string fields. So once a
> > piece
> > > of text is tokenized, there is no way to perform a regex query across
> > word
> > > boundaries.
> > >
> > > fullname_s:john smi* is working for me.
> > >
> > > {
> > >  "responseHeader":{
> > >"zkConnected":true,
> > >"status":0,
> > >"QTime":16,
> > >"params":{
> > >  "q":"fullname_s:john smi*",
> > >  "indent":"on",
> > >  "wt":"json"}},
> > >  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
> > >  {
> > >"id":"1",
> > >"fullname_s":"john smith",
> > >"_version_":1569446064473243648}]
> > >  }}
> > >
> > > I am on Solr 6.5.0. What version you are on?
> > >
> > >
> > > Amrit Sarkar
> > > Search Engineer
> > > Lucidworks, Inc.
> > > 415-589-9269
> > > www.lucidworks.com
> > > Twitter http://twitter.com/lucidworks
> > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > >
> > > On Tue, Jun 6, 2017 at 1:30 PM, Nick Way  >
> > > wrote:
> > >
> > >> Hi - I have a Solr collection with a custom field "fullname_s" (a
> > string).
> > >>
> > >> I want "john smi" to find "john smith" (I lower-cased the names upon
> > >> indexing them)
> > >>
> > >> I have tried
> > >>
> > >> fullname_s:"john smi*"
> > >> fullname_s:john smi*
> > >> fullname_s:"john smi?"
> > >> fullname_s:john smi?
> > >>
> > >>
> > >> but nothing gives the expected result - am I missing something? I
> spent
> > >> hours on this one point yesterday so if anyone can please point me in
> > the
> > >> right direction I'd be really grateful.
> > >>
> > >> I'm using Solr with Adobe Coldfusion by the way but I think the
> > principles
> > >> are the same.
> > >>
> > >> Thank you!
> > >>
> > >> Nick
> > >>
> >
> >
>


Re: com.ibm.icu dependency errors when building solr source code

2017-06-22 Thread Amrit Sarkar
Running "ant eclipse" or "ant test" in verbose mode will provide you the
exact lib in ivy2 cache which is corrupt. Delete that particular lib and
run "ant" again. Also don't try to get out / exit  "ant" commands via
Ctrl+C or Ctrl+V while it is downloading the libraries to ivy2 folder.


Re: async backup

2017-06-27 Thread Amrit Sarkar
Damien,

then I poll with REQUESTSTATUS


REQUESTSTATUS is an API which provided you the status of the any API
(including other heavy duty apis like SPLITSHARD or CREATECOLLECTION)
associated with async_id at that current timestamp / moment. Does that give
you "state"="completed"?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Jun 27, 2017 at 5:25 AM, Damien Kamerman  wrote:

> A regular backup creates the files in this order:
> drwxr-xr-x   2 root root  63 Jun 27 09:46 snapshot.shard7
> drwxr-xr-x   2 root root 159 Jun 27 09:46 snapshot.shard8
> drwxr-xr-x   2 root root 135 Jun 27 09:46 snapshot.shard1
> drwxr-xr-x   2 root root 178 Jun 27 09:46 snapshot.shard3
> drwxr-xr-x   2 root root 210 Jun 27 09:46 snapshot.shard11
> drwxr-xr-x   2 root root 218 Jun 27 09:46 snapshot.shard9
> drwxr-xr-x   2 root root 180 Jun 27 09:46 snapshot.shard2
> drwxr-xr-x   2 root root 164 Jun 27 09:47 snapshot.shard5
> drwxr-xr-x   2 root root 252 Jun 27 09:47 snapshot.shard6
> drwxr-xr-x   2 root root 103 Jun 27 09:47 snapshot.shard12
> drwxr-xr-x   2 root root 135 Jun 27 09:47 snapshot.shard4
> drwxr-xr-x   2 root root 119 Jun 27 09:47 snapshot.shard10
> drwxr-xr-x   3 root root   4 Jun 27 09:47 zk_backup
> -rw-r--r--   1 root root 185 Jun 27 09:47 backup.properties
>
> While an async backup creates files in this order:
> drwxr-xr-x   2 root root  15 Jun 27 09:49 snapshot.shard3
> drwxr-xr-x   2 root root  15 Jun 27 09:49 snapshot.shard9
> drwxr-xr-x   2 root root  62 Jun 27 09:49 snapshot.shard6
> drwxr-xr-x   2 root root  37 Jun 27 09:49 snapshot.shard2
> drwxr-xr-x   2 root root  67 Jun 27 09:49 snapshot.shard7
> drwxr-xr-x   2 root root  75 Jun 27 09:49 snapshot.shard5
> drwxr-xr-x   2 root root  70 Jun 27 09:49 snapshot.shard8
> drwxr-xr-x   2 root root  15 Jun 27 09:49 snapshot.shard4
> drwxr-xr-x   2 root root  15 Jun 27 09:50 snapshot.shard11
> drwxr-xr-x   2 root root 127 Jun 27 09:50 snapshot.shard1
> drwxr-xr-x   2 root root 116 Jun 27 09:50 snapshot.shard12
> drwxr-xr-x   3 root root   4 Jun 27 09:50 zk_backup
> -rw-r--r--   1 root root 185 Jun 27 09:50 backup.properties
> drwxr-xr-x   2 root root  25 Jun 27 09:51 snapshot.shard10
>
>
> shard10 is much larger than the other shards.
>
> From the logs:
> INFO  - 2017-06-27 09:50:33.832; [   ] org.apache.solr.cloud.BackupCmd;
> Completed backing up ZK data for backupName=collection1
> INFO  - 2017-06-27 09:50:33.800; [   ]
> org.apache.solr.handler.admin.CoreAdminOperation; Checking request status
> for : backup1103459705035055
> INFO  - 2017-06-27 09:50:33.800; [   ]
> org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null
> path=/admin/cores
> params={qt=/admin/cores&requestid=backup1103459705035055&action=
> REQUESTSTATUS&wt=javabin&version=2}
> status=0 QTime=0
> INFO  - 2017-06-27 09:51:33.405; [   ] org.apache.solr.handler.
> SnapShooter;
> Done creating backup snapshot: shard10 at file:///online/backup/
> collection1
>
> Has anyone seen this bug, or knows a workaround?
>
>
> On 27 June 2017 at 09:47, Damien Kamerman  wrote:
>
> > Yes, the async command returns, and then I poll with REQUESTSTATUS.
> >
> > On 27 June 2017 at 01:24, Varun Thacker  wrote:
> >
> >> Hi Damien,
> >>
> >> A backup command with async is supposed to return early. It is start the
> >> backup process and return.
> >>
> >> Are you using the REQUESTSTATUS (
> >> http://lucene.apache.org/solr/guide/6_6/collections-api.html
> >> #collections-api
> >> ) API to validate if the backup is complete?
> >>
> >> On Sun, Jun 25, 2017 at 10:28 PM, Damien Kamerman 
> >> wrote:
> >>
> >> > I've noticed an issue with the Solr 6.5.1 Collections API BACKUP async
> >> > command returning early. The state is finished well before one shard
> is
> >> > finished.
> >> >
> >> > The collection I'm backing up has 12 shards across 6 nodes and I
> suspect
> >> > the issue is that it is not waiting for all backups on the node to
> >> finish.
> >> >
> >> > Alternatively, I if I change the request to not be async it works OK
> but
> >> > sometimes I get the exception "backup the collection time out:180s".
> >> >
> >> > Has anyone seen this, or knows a workaround?
> >> >
> >> > Cheers,
> >> > Damien.
> >> >
> >>
> >
> >
>


Re: dynamic datasource password in db_data_config file

2017-07-17 Thread Amrit Sarkar
Javed,

Can you let us know if you are running in standalone or cloud mode?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, Jul 17, 2017 at 11:54 AM, javeed  wrote:

> HI Team,
> Can you please update on this issue.
>
> Thank you
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/dynamic-datasource-password-in-db-data-config-
> file-tp4345804p4346288.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: TransactionLog doesn't know how to serialize class java.util.UUID; try implementing ObjectResolver?

2017-07-17 Thread Amrit Sarkar
I looked into the code TransactionLog.java (branch_5_5) ::

JavaBinCodec.ObjectResolver resolver = new JavaBinCodec.ObjectResolver() {
  @Override
  public Object resolve(Object o, JavaBinCodec codec) throws IOException {
if (o instanceof BytesRef) {
  BytesRef br = (BytesRef)o;
  codec.writeByteArray(br.bytes, br.offset, br.length);
  return null;
}
// Fallback: we have no idea how to serialize this.  Be noisy to
prevent insidious bugs
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"TransactionLog doesn't know how to serialize " + o.getClass()
+ "; try implementing ObjectResolver?");
  }
};

While UUID implements serializable, so should be BytesRef instance to?? ::

public final class UUID implements java.io.Serializable, Comparable

Can you share the payload with you are trying to update?



Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, Jul 17, 2017 at 7:03 PM, deviantcode  wrote:

> Hi Mahmoud, did you ever get to the bottom of this? I'm having the same
> issue
> on solr 5.5.2
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/TransactionLog-doesn-t-know-how-to-serialize-
> class-java-util-UUID-try-implementing-ObjectResolver-
> tp4332277p4346335.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Subfaceting

2017-07-17 Thread Amrit Sarkar
Poornima,

Regarding 3;
You can do something like:

CloudSolrClient client = new CloudSolrClient("localhost:9983");

SolrParams params = new ModifiableSolrParams().add("q","*:*")
.add("json.facet","{.}");

QueryResponse response = client.query(params);

Setting key and value via SolrParams is available.


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, Jul 17, 2017 at 8:48 PM, Ponnuswamy, Poornima (GE Healthcare) <
poornima.ponnusw...@ge.com> wrote:

> Hello,
>
> We have Solr version 6.4.2  and we have been using Solr Subfaceting –
> Terms Facet as per the document https://cwiki.apache.org/
> confluence/display/solr/Faceted+Search in our project.
>
> In our project which is going to go in production soon, we use it for
> getting the facet/subfacet counts, sort etc. We make a direct rest call to
> solr and the counts matches perfectly. I have few questions and
> clarification on this approach and appreciate your response on this.
>
>
>
>   1.  In confluence - https://cwiki.apache.org/confluence/display/solr/
> Faceted+Search it page says its experimental and may change
> significantly. Is it safe for us to use the Terms faceting or will it
> change in future releases?. When will this be official?.
>   2.  As Term faceting has few advantages over Pivot facet as per
> http://yonik.com/solr-subfacets/ we went on with it. Is it safe to use it
> or do we use Pivot faceting instead?
>   3.  Currently we make a rest call to Solr API to get results. Now we are
> planning to move to Solr Cloud and use Solrj library to integrate with
> Solr. I don’t see any support for Terms faceting (json.facet) in Solrj
> library. Am I overlooking it or will it be supported in future releases?
>
> Appreciate your response.
>
> Thanks,
> Poornima
>
>


Re: Solr Subfaceting

2017-07-17 Thread Amrit Sarkar
Poornima,

  1.  In confluence - https://cwiki.apache.org/confluence/display/solr/
Faceted+Search it page says its experimental and may change significantly.
Is it safe for us to use the Terms faceting or will it change in future
releases?. When will this be official?.

A lot of people / engineers are using json faceting in their production
today itself. By "experimental and may change significantly" simple means
the end points of request and response may change in in future releases,
hence the back-compat will suffer. If you are upgrading to future released
solr version, you have to make sure the client code you have wrote at your
end (via SolrJ) is upto date with that solr version (you upgrade to).

  2.  As Term faceting has few advantages over Pivot facet as per
http://yonik.com/solr-subfacets/ we went on with it. Is it safe to use it
or do we use Pivot faceting instead?

In my opinion, you should use the better feature. Though you may hit some
limitations of json faceting and their respective would be jiras opened too.

Rest Mr. Seeley would be the the best person the 2nd.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, Jul 17, 2017 at 10:43 PM, Ponnuswamy, Poornima (GE Healthcare) <
poornima.ponnusw...@ge.com> wrote:

> Thanks for your response. I have tried with SolrParams and it works for me.
>
> Any feedback on question 1 & 2.
>
> Thanks,
> Poornima
>
> On 7/17/17, 12:38 PM, "Amrit Sarkar"  wrote:
>
> Poornima,
>
> Regarding 3;
> You can do something like:
>
> CloudSolrClient client = new CloudSolrClient("localhost:9983");
>
> SolrParams params = new ModifiableSolrParams().add("q","*:*")
> .add("json.facet","{.}");
>
> QueryResponse response = client.query(params);
>
> Setting key and value via SolrParams is available.
>
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Mon, Jul 17, 2017 at 8:48 PM, Ponnuswamy, Poornima (GE Healthcare) <
> poornima.ponnusw...@ge.com> wrote:
>
> > Hello,
> >
> > We have Solr version 6.4.2  and we have been using Solr Subfaceting –
> > Terms Facet as per the document https://cwiki.apache.org/
> > confluence/display/solr/Faceted+Search in our project.
> >
> > In our project which is going to go in production soon, we use it for
> > getting the facet/subfacet counts, sort etc. We make a direct rest
> call to
> > solr and the counts matches perfectly. I have few questions and
> > clarification on this approach and appreciate your response on this.
> >
> >
> >
> >   1.  In confluence - https://cwiki.apache.org/
> confluence/display/solr/
> > Faceted+Search it page says its experimental and may change
> > significantly. Is it safe for us to use the Terms faceting or will it
> > change in future releases?. When will this be official?.
> >   2.  As Term faceting has few advantages over Pivot facet as per
> > http://yonik.com/solr-subfacets/ we went on with it. Is it safe to
> use it
> > or do we use Pivot faceting instead?
> >   3.  Currently we make a rest call to Solr API to get results. Now
> we are
> > planning to move to Solr Cloud and use Solrj library to integrate
> with
> > Solr. I don’t see any support for Terms faceting (json.facet) in
> Solrj
> > library. Am I overlooking it or will it be supported in future
> releases?
> >
> > Appreciate your response.
> >
> > Thanks,
> > Poornima
> >
> >
>
>
>


Re: Parent child documents partial update

2017-07-17 Thread Amrit Sarkar
Sujay,

Not really. Parent-child documents are stored in a single block
contiguously. Read more about parent-child relationship at:
https://medium.com/@sarkaramrit2/multiple-documents-with-same-doc-id-in-index-in-solr-cloud-32c072db2164

While we perform partial / atomic update, say {"id":"X",
"fieldA":{"set":"Z"}, that particular doc with X will be fetched (all the
"stored" fields), update will be performed and indexed, all happens in
*DistributedUpdateProcessor* internally. So there is no way it will fetch
the child documents along with it.

I am not sure whether this can be done with current code or it will be
fixed / improved in the future.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, Jul 17, 2017 at 12:44 PM, Sujay Bawaskar 
wrote:

> Hi,
>
> Need a help to understand solr parent child document partial update
> behaviour. Can we perform partial update on parent document without losing
> its chiild documents? My observation is that parent child relationship
> between documents get lost in case partial update is performed on parent.
> Any work around or solution to this issue?
>
> --
> Thanks,
> Sujay P Bawaskar
> M:+91-77091 53669
>


Re: Help with updateHandler commit stats

2017-07-17 Thread Amrit Sarkar
Antonio,

I think it is itself suggesting what it is. Meanwhile in official
documentation:

autocommits

Total number of auto-commits executed.

so yeah, total number of commits executed in the core's lifetime.

Look into:
https://cwiki.apache.org/confluence/display/solr/Performance+Statistics+Reference
for more details.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 7, 2017 at 4:15 PM, Antonio De Miguel 
wrote:

> Hi,
>
> I'm taking a look to UpdateHandler stats... and i see when autosoftcommit
> occurs (every 10 secs) both metrics, "commits" and "soft autocommits"
> increments by one. ¿is this normal?
>
> My config is:
>
> autoCommit: 180 secs
> autoSoftCommit: 10 secs
>
> Thanks!
>


Re: CloudSolrClient preferred over LBHttpSolrClient

2017-07-17 Thread Amrit Sarkar
S G,

Not sure about the documentation but:

The CloudSolrClient uses a connection to zookeeper to extract cluster
information like who is a the leader for a shard in a solr collection. To
create a CloudSolrClient all you specify is the zookeepers and which
collection you want to work with. Behind the scenes solrj will load balance
and send the request to the right "shard" in the cluster. The
CloudSolrClient is better if you have a cluster of multiple solr nodes
across multiple machines.

While in LBHttpSolrClient, load balancing is done using a simple
round-robin on the list of servers.

Hope this helps.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, Jul 17, 2017 at 11:38 PM, S G  wrote:

> Hi,
>
> Does anyone know if CloudSolrClient is preferred over LBHttpSolrClient ?
> If yes, why so and has there been any good performance benefits documented
> anywhere?
>
> Thanks
> SG
>


Re: Parent child documents partial update

2017-07-17 Thread Amrit Sarkar
Sujay,

Lucene index is in flat-object document style, so I really not think nested
documents at index / storage will ever be supported unless someone change
the very intricacy of the index.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Jul 18, 2017 at 8:11 AM, Sujay Bawaskar 
wrote:

> Thanks Amrit. So storage mechanism of parent child documents is limiting
> the capability of partial update. It would be great to have flawless parent
> child index support in solr.
>
> On 17-Jul-2017 11:14 PM, "Amrit Sarkar"  wrote:
>
> > Sujay,
> >
> > Not really. Parent-child documents are stored in a single block
> > contiguously. Read more about parent-child relationship at:
> > https://medium.com/@sarkaramrit2/multiple-documents-with-same-doc-id-in-
> > index-in-solr-cloud-32c072db2164
> >
> > While we perform partial / atomic update, say {"id":"X",
> > "fieldA":{"set":"Z"}, that particular doc with X will be fetched (all the
> > "stored" fields), update will be performed and indexed, all happens in
> > *DistributedUpdateProcessor* internally. So there is no way it will fetch
> > the child documents along with it.
> >
> > I am not sure whether this can be done with current code or it will be
> > fixed / improved in the future.
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >
> > On Mon, Jul 17, 2017 at 12:44 PM, Sujay Bawaskar <
> sujaybawas...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Need a help to understand solr parent child document partial update
> > > behaviour. Can we perform partial update on parent document without
> > losing
> > > its chiild documents? My observation is that parent child relationship
> > > between documents get lost in case partial update is performed on
> parent.
> > > Any work around or solution to this issue?
> > >
> > > --
> > > Thanks,
> > > Sujay P Bawaskar
> > > M:+91-77091 53669
> > >
> >
>


Re: multiValued=false is not working in Solr 6.4 in RHEL/CentOS

2017-07-20 Thread Amrit Sarkar
By saying:

 I am just adding multiValued=false in the managed-schema file.


Are you modifying in the local filesystem "conf" or going into the core
conf directory and changing there? If you are SolrCloud, you should change
the same on Zookeeper.


Re: CDCR - how to deal with the transaction log files

2017-07-21 Thread Amrit Sarkar
Patrick,

Yes! You created default UpdateLog which got written to a disk and then you
changed it to CdcrUpdateLog in configs. I find no reason it would create a
proper COLLECTIONCHECKPOINT on target tlog.

One thing you can try before creating / starting from scratch is restarting
source cluster nodes, the leaders of shard will try to create the same
COLLECTIONCHECKPOINT, which may or may not be successful.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 11:09 AM, Patrick Hoeffel <
patrick.hoef...@polarisalpha.com> wrote:

> I'm working on my first setup of CDCR, and I'm seeing the same "The log
> reader for target collection {collection name} is not initialised" as you
> saw.
>
> It looks like you're creating collections on a regular basis, but for me,
> I create it one time and never again. I've been creating the collection
> first from defaults and then applying the CDCR-aware solrconfig changes
> afterward. It sounds like maybe I need to create the configset in ZK first,
> then create the collections, first on the Target and then on the Source,
> and I should be good?
>
> Thanks,
>
> Patrick Hoeffel
> Senior Software Engineer
> (Direct)  719-452-7371
> (Mobile) 719-210-3706
> patrick.hoef...@polarisalpha.com
> PolarisAlpha.com
>
>
> -Original Message-
> From: jmyatt [mailto:jmy...@wayfair.com]
> Sent: Wednesday, July 12, 2017 4:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: CDCR - how to deal with the transaction log files
>
> glad to hear you found your solution!  I have been combing over this post
> and others on this discussion board many times and have tried so many
> tweaks to configuration, order of steps, etc, all with absolutely no
> success in getting the Source cluster tlogs to delete.  So incredibly
> frustrating.  If anyone has other pearls of wisdom I'd love some advice.
> Quick hits on what I've tried:
>
> - solrconfig exactly like Sean's (target and source respectively) expect
> no autoSoftCommit
> - I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
> target) explicitly before starting since the config setting of
> defaultState=disabled doesn't seem to work
> - when I create the collection on source first, I get the warning "The log
> reader for target collection {collection name} is not initialised".  When I
> reverse the order (create the collection on target first), no such warning
> - tlogs replicate as expected, hard commits on both target and source
> cause tlogs to rollover, etc - all of that works as expected
> - action=QUEUES on source reflects the queueSize accurately.  Also
> *always* shows updateLogSynchronizer state as "stopped"
> - action=LASTPROCESSEDVERSION on both source and target always seems
> correct (I don't see the -1 that Sean mentioned).
> - I'm creating new collections every time and running full data imports
> that take 5-10 minutes. Again, all data replication, log rollover, and
> autocommit activity seems to work as expected, and logs on target are
> deleted.  It's just those pesky source tlogs I can't get to delete.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/CDCR-how-to-deal-with-the-transaction-log-
> files-tp4345062p4345715.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: atomic updates in conjunction with optimistic concurrency

2017-07-21 Thread Amrit Sarkar
Hendrik,

Can you list down the error snippet so that we can refer the code where
exactly that is happening.


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 9:50 PM, Hendrik Haddorp 
wrote:

> Hi,
>
> when I try to use an atomic update in conjunction with optimistic
> concurrency Solr sometimes complains that the version I passed in does not
> match. The version in my request however match to what is stored and what
> the exception states as the actual version does not exist in the collection
> at all. Strangely this does only happen sometimes but once it happens for a
> collection it seems to stay like that. Any idea why that might happen?
>
> I'm using Solr 6.3 in Cloud mode with SolrJ.
>
> regards,
> Hendrik
>


Re: atomic updates in conjunction with optimistic concurrency

2017-07-21 Thread Amrit Sarkar
Hendrik,

Ran a little test on 6.3, with infinite atomic updates with optimistic
concurrency,
cannot *reproduce*:

List docs = new ArrayList<>();
> SolrInputDocument document = new SolrInputDocument();
> document.addField("id", String.valueOf(1));
> document.addField("external_version_field_s", System.currentTimeMillis()); // 
> normal update
> docs.add(document);
> UpdateRequest updateRequest = new UpdateRequest();
> updateRequest.add(docs);
> client.request(updateRequest, collection);
> updateRequest = new UpdateRequest();
> updateRequest.commit(client, collection);
>
> while (true) {
> QueryResponse response = client.query(new ModifiableSolrParams().add("q", 
> "id:1"));
> System.out.println(response.getResults().get(0).get("_version_"));
> docs = new ArrayList<>();
> document = new SolrInputDocument();
> document.addField("id", String.valueOf(1));
> Map map = new HashMap<>();
> map.put("set", createSentance(1)); // atomic map value
> document.addField("external_version_field_s", map);
> document.addField("_version_", 
> response.getResults().get(0).get("_version_"));
> docs.add(document);
> updateRequest = new UpdateRequest();
> updateRequest.add(docs);
> client.request(updateRequest, collection);
> updateRequest = new UpdateRequest();
> updateRequest.commit(client, collection);
> }
>
> Maybe you can let us know more details how the update been made?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 10:36 PM, Hendrik Haddorp 
wrote:

> Hi,
>
> I can't find anything about this in the Solr logs. On the caller side I
> have this:
> Error from server at http://x_shard1_replica2: version conflict for
> x expected=1573538179623944192 actual=1573546159565176832
> org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error
> from server at http://x_shard1_replica2: version conflict for x
> expected=1573538179623944192 actual=1573546159565176832
> at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:765)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1173)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWit
> hRetryOnStaleState(CloudSolrClient.java:1062)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1004)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> ...
> Caused by: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error from server at http://x_shard1_replica2: version conflict for
> x expected=1573538179623944192 actual=1573546159565176832
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at 
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:435)
> ~[solr-solrj-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 -
> shalin - 2016-11-02 19:52:43]
> at 
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(L

Re: Sum of double fields in JSON Facet

2017-07-25 Thread Amrit Sarkar
Zheng,

You may want to check https://issues.apache.org/jira/browse/SOLR-7452. I
don't know whether they are absolutely related but I am sure I have seen
complaints and enquiries regarding not precise statistics with JSON Facets.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Jul 25, 2017 at 6:27 PM, Zheng Lin Edwin Yeo 
wrote:

> This is the way which I put my JSON facet.
>
> totalAmount:"sum(sum(amount1_d,amount2_d))"
>
> amount1_d: 69446961.2
> amount2_d: 0
>
> Result I get: 69446959.27
>
>
> Regards,
> Edwin
>
>
> On 25 July 2017 at 20:44, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi,
> >
> > I'm trying to do a sum of two double fields in JSON Facet. One of the
> > field has a value of 69446961.2, while the other is 0. However, when I
> get
> > the result, I'm getting a value of 69446959.27. This is 1.93 lesser than
> > the original value.
> >
> > What could be the reason?
> >
> > I'm using Solr 6.5.1.
> >
> > Regards,
> > Edwin
> >
>


Re: SOLR Metric Reporting to graphite

2017-08-06 Thread Amrit Sarkar
Hi,

I didn't had a chance to go through the steps you are doing, but I followed
the one written by Varun Thacker via influxdb:
https://github.com/vthacker/solr-metrics-influxdb, and it works fine. Maybe
it can be of some help.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Sun, Aug 6, 2017 at 9:47 PM, abhi Abhishek  wrote:

> Hi All,
> I am trying to setup the graphite reporter for SOLR 6.5.0. i've started
> a sample docker instance for graphite with statd (
> https://github.com/hopsoft/docker-graphite-statsd).
>
> also i've added the graphite metrics reporter in the SOLR.xml config of the
> collection. however post doing this i dont see any data getting posted to
> the graphite (
> https://cwiki.apache.org/confluence/display/solr/Metrics+Reporting).
> added XML Config to solr.xml
>  
>class="org.apache.solr.metrics.reporters.SolrGraphiteReporter">
> localhost
> 2003
> 1
>   
>  
> Graphite Mapped Ports
> HostContainerService
> 80 80 nginx <https://www.nginx.com/resources/admin-guide/>
> 2003 2003 carbon receiver - plaintext
> <http://graphite.readthedocs.io/en/latest/feeding-carbon.
> html#the-plaintext-protocol>
> 2004 2004 carbon receiver - pickle
> <http://graphite.readthedocs.io/en/latest/feeding-carbon.
> html#the-pickle-protocol>
> 2023 2023 carbon aggregator - plaintext
> <http://graphite.readthedocs.io/en/latest/carbon-daemons.
> html#carbon-aggregator-py>
> 2024 2024 carbon aggregator - pickle
> <http://graphite.readthedocs.io/en/latest/carbon-daemons.
> html#carbon-aggregator-py>
> 8125 8125 statsd <https://github.com/etsy/statsd/blob/master/docs/
> server.md>
> 8126 8126 statsd admin
> <https://github.com/etsy/statsd/blob/v0.7.2/docs/admin_interface.md>
> <https://github.com/hopsoft/docker-graphite-statsd#mounted-volumes>
> please advice if i am doing something wrong here.
>
> Thanks,
> Abhishek
>


Re: Highlighting Performance improvement suggestions required - Solr 6.5.1

2017-08-09 Thread Amrit Sarkar
Pardon I didn't go through details in configs and I guess you have already
went through the recent talks on highlighters, still sharing if not:

https://www.slideshare.net/lucidworks/solr-highlighting-at-full-speed-presented-by-timothy-rodriguez-bloomberg-david-smiley-d-w-smiley-llc
https://www.youtube.com/watch?v=tv5qKDKW8kk

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Wed, Aug 9, 2017 at 7:45 PM, sasarun  wrote:

> Hi All,
>
> I found quite a few discussions on the highlighting performance issue.
> Though I tried to implement most of them, performance improvement was
> negative.
> Currently index count is really low with about 922 records . But the field
> on which highlighting is done is quite large data. Querying of data with
> highlighting is taking lots of time with 85-90% time taken on highlighting.
> Configuration of  my set schema.xml is as below
>
> fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
> 
> 
> 
>  words="stopwords.txt" />
>
> 
>   
>   
> 
> 
>  words="stopwords.txt" />
>  ignoreCase="true" expand="true"/>
> 
>   
> 
>  stored="true"
> termVectors="true" termPositions="true" termOffsets="true"
> storeOffsetsWithPositions="true"/>
>  stored="true"/>
> 
>
> Query used in solr is
>
> hl=true&hl.fl=customContent&hl.fragsize=500&hl.simple.pre=
> &hl.simple.post=&hl.snippets=1&hl.method=unified&
> hl.bs.type=SENTENCE&hl.fragListBuilder=simple&hl.
> maxAnalyzedChars=214748364&facet=true&facet.mincount=1&
> facet.limit=-1&facet.s
> ort=count&debug=timing&facet.field=contentSpecific
>
> Also note that We had tried fastvectorhighlighter too but the result was
> not
> positive. Once when we tried to hl.offsetSource="term_vectors" with unified
> result came up in half a second but it didnt had any highlight snippets.
>
> One of the debug returned by solr is shared below for reference
>
> time=8833.0,prepare={time=0.0,query={time=0.0},facet={time=
> 0.0},facet_module={time=0.0},mlt={time=0.0},hig
> hlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={
> time=0.0},debug={time=0.0}},process={time=8826.0,query={
> time=867.0},facet={time=2.0},facet_module={time=0.0},mlt={
> time=0.0},highlight={time=7953.0},stats={time=0.0},expand={time=0.0},ter
> ms={time=0.0},debug={time=0.0}},loadFieldValues={time=28.0}}
>
> Any suggestions to  improve the performance would be of great help
>
> Thanks,
> Arun
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Highlighting-Performance-improvement-
> suggestions-required-Solr-6-5-1-tp4349767.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: CDCR does not work

2017-09-28 Thread Amrit Sarkar
Pretty much what Webster and Erick mentioned, else please try the pdf I
attached. I followed the official documentation doing that.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Sep 28, 2017 at 8:56 PM, Erick Erickson 
wrote:

> If Webster's idea doesn't solve it, the next thing to check is your
> tlogs on the source cluster. If you have a successful connection to
> the target and it's operative, the tlogs should be regularly pruned.
> If not, they'll collect updates forever.
>
> Also, your Solr logs should show messages as CDCR does its work, to
> you see any evidence that it's
> 1> running
> 2> sending docs?
>
> Also, your problem description doesn't provide any information other
> than "it doesn't work", which makes it very hard to offer anything
> except generalities, you might review:
>
> https://wiki.apache.org/solr/UsingMailingLists
>
> Best,
> Erick
>
>
> On Thu, Sep 28, 2017 at 7:47 AM, Webster Homer 
> wrote:
> > Check that you have autoCommit enabled in the target schema.
> >
> > Try sending a commit to the target collection. If you don't have
> autoCommit
> > enabled then the data could be replicating but not committed so not
> > searchable
> >
> > On Thu, Sep 28, 2017 at 1:57 AM, Jiani Yang  wrote:
> >
> >> Hi,
> >>
> >> Recently I am trying to use CDCR to do the replication of my solr
> cluster.
> >> I have done exactly as what the tutorial says, the tutorial link is
> shown
> >> below:
> >> https://lucene.apache.org/solr/guide/6_6/cross-data-
> >> center-replication-cdcr.html
> >>
> >> But I cannot see any change on target data center even every status
> looks
> >> fine. I have been stuck in this situation for a week and could not find
> a
> >> way to resolve it, could you please help me?
> >>
> >> Please reply me ASAP! Thank you!
> >>
> >> Best,
> >> Jiani
> >>
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>


Re: Very high number of deleted docs

2017-10-04 Thread Amrit Sarkar
Hi Markus,

Emir already mentioned tuning *reclaimDeletesWeight which *affects segments
about to merge priority. Optimising index time by time, preferably
scheduling weekly / fortnight / ..., at low traffic period to never be in
such odd position of 80% deleted docs in total index.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Wed, Oct 4, 2017 at 6:02 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Markus,
> You can set reclaimDeletesWeight in merge settings to some higher value
> than default (I think it is 2) to favor segments with deleted docs when
> merging.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 4 Oct 2017, at 13:31, Markus Jelsma 
> wrote:
> >
> > Hello,
> >
> > Using a 6.6.0, i just spotted one of our collections having a core of
> which over 80 % of the total number of documents were deleted documents.
> >
> > It has  > class="org.apache.solr.index.TieredMergePolicyFactory"/>
> configured with no non-default settings.
> >
> > Is this supposed to happen? How can i prevent these kind of numbers?
> >
> > Thanks,
> > Markus
>
>


Re: Getting user-level KeeperException

2017-10-12 Thread Amrit Sarkar
Gunalan,

Zookeeper throws KeeperException at /overseer for most of the solr issues,
namely indexing. Sync the timestamp of zookeeper error with solr log; the
problem lies there most probably.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Oct 12, 2017 at 7:52 AM, Gunalan V  wrote:

> Hello,
>
> Could someone please let me know what this user-level keeper exception in
> zookeeper mean? and How to fix the same.
>
>
>
>
>
> Thanks,
> GVK
>


Re: Solr related questions

2017-10-13 Thread Amrit Sarkar
Hi,

1.) I created a core and tried to simplify the managed-schema file. But if
> I remove all "unecessary" fields/fieldtypes, I get errors like: field
> "_version_" is missing, type "boolean" is missing and so on. Why do I have
> to define this types/fields? Which fields/fieldtypes are required?


Solr expects the primitive field names and types in the schema. Though a
better explanation should be there. "_version_" and a unique id field is
mandatory for each document as "_version_" contains the current version of
the document utilised in sync across nodes and atomic updation of the
documents.

 2.) Can I modify the managed-schema remotly/by program e.g. with a post

request or only by editing the managed-schema file directly?

Sure, Schema API is available to us for a while:
https://lucene.apache.org/solr/guide/6_6/schema-api.html

3.) When I have a service(solrnet client) that pushes a file from a
> fileserver to solr, will it cause two times traffic? (from the fileserver
> to my service and from the service to solr?) Is there a chance to index the
> file direct? (I need to add additional attributes to the index document)


Two times traffic? where? Solr will receive the docs once so we are good at
that part. Please utilize the SolrJ to index documents if possible, as it
is most updates one, if you are on solrcloud, use CloudSolrJClient.
Regarding index files direct, you can utilize the DIH (DataImportHandler),
depends on the file format, its csv, xml, json, but mind it is single
threaded.

Hope this clarifies some of it.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 3:10 PM, startrekfan 
wrote:

> Hello,
>
> I have some Solr related questions:
>
> 1.) I created a core and tried to simplify the managed-schema file. But if
> I remove all "unecessary" fields/fieldtypes, I get errors like: field
> "_version_" is missing, type "boolean" is missing and so on. Why do I have
> to define this types/fields? Which fields/fieldtypes are required?
>
> 2.) Can I modify the managed-schema remotly/by program e.g. with a post
> request or only by editing the managed-schema file directly?
>
> 3.) When I have a service(solrnet client) that pushes a file from a
> fileserver to solr, will it cause two times traffic? (from the fileserver
> to my service and from the service to solr?) Is there a chance to index the
> file direct? (I need to add additional attributes to the index document)
>
> Thank you
>


Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Kevin,

You are getting NPE at:

String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL

// related code

String rawContentType = conn.getContentType();

public String getContentType() {
return getHeaderField("content-type");
}

HttpURLConnection conn = (HttpURLConnection) u.openConnection();

Can you check at your webpage level headers are properly set and it
has key "content-type".


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Wed, Oct 11, 2017 at 9:08 PM, Kevin Layer  wrote:

> I want to use solr to index a markdown website.  The files
> are in native markdown, but they are served in HTML (by markserv).
>
> Here's what I did:
>
> docker run --name solr -d -p 8983:8983 -t solr
> docker exec -it --user=solr solr bin/solr create_core -c handbook
>
> Then, to crawl the site:
>
> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook
> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes md
> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
> org.apache.solr.util.SimplePostTool http://quadra.franz.com:9091/index.md
> SimplePostTool version 5.0.0
> Posting web pages to Solr url http://localhost:8983/solr/
> handbook/update/extract
> Entering auto mode. Indexing pages with content-types corresponding to
> file endings md
> SimplePostTool: WARNING: Never crawl an external web site faster than
> every 10 seconds, your IP will probably be blocked
> Entering recursive mode, depth=10, delay=0s
> Entering crawl at level 0 (1 links total, 1 new)
> Exception in thread "main" java.lang.NullPointerException
> at org.apache.solr.util.SimplePostTool$PageFetcher.
> readPageFromUrl(SimplePostTool.java:1138)
> at org.apache.solr.util.SimplePostTool.webCrawl(
> SimplePostTool.java:603)
> at org.apache.solr.util.SimplePostTool.postWebPages(
> SimplePostTool.java:563)
> at org.apache.solr.util.SimplePostTool.doWebMode(
> SimplePostTool.java:365)
> at org.apache.solr.util.SimplePostTool.execute(
> SimplePostTool.java:187)
> at org.apache.solr.util.SimplePostTool.main(
> SimplePostTool.java:172)
> quadra[git:master]$
>
>
> Any ideas on what I did wrong?
>
> Thanks.
>
> Kevin
>


Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Strange,

Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
Content-Type. Let's see what it says now.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer  wrote:

> OK, so I hacked markserv to add Content-Type text/html, but now I get
>
> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>
> What is it expecting?
>
> $ docker exec -it --user=solr solr bin/post -c handbook
> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
> SimplePostTool version 5.0.0
> Posting web pages to Solr url http://localhost:8983/solr/
> handbook/update/extract
> Entering auto mode. Indexing pages with content-types corresponding to
> file endings md
> SimplePostTool: WARNING: Never crawl an external web site faster than
> every 10 seconds, your IP will probably be blocked
> Entering recursive mode, depth=10, delay=0s
> Entering crawl at level 0 (1 links total, 1 new)
> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
> SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a
> HTTP result status of 415
> 0 web pages indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/
> handbook/update/extract...
> Time spent: 0:00:03.882
> $
>
> Thanks.
>
> Kevin
>


Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Reference to the code:

.

String rawContentType = conn.getContentType();
String type = rawContentType.split(";")[0];
if(typeSupported(type) || "*".equals(fileTypes)) {
  String encoding = conn.getContentEncoding();

.

protected boolean typeSupported(String type) {
  for(String key : mimeMap.keySet()) {
if(mimeMap.get(key).equals(type)) {
  if(fileTypes.contains(key))
return true;
}
  }
  return false;
}

.

It has another check for fileTypes, I can see the page ending with .md
(which you are indexing) and not .html. Let's hope now this is not the
issue.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 7:04 PM, Amrit Sarkar 
wrote:

> Kevin,
>
> Just put "html" too and give it a shot. These are the types it is
> expecting:
>
> mimeMap = new HashMap<>();
> mimeMap.put("xml", "application/xml");
> mimeMap.put("csv", "text/csv");
> mimeMap.put("json", "application/json");
> mimeMap.put("jsonl", "application/json");
> mimeMap.put("pdf", "application/pdf");
> mimeMap.put("rtf", "text/rtf");
> mimeMap.put("html", "text/html");
> mimeMap.put("htm", "text/html");
> mimeMap.put("doc", "application/msword");
> mimeMap.put("docx", 
> "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
> mimeMap.put("pptx", 
> "application/vnd.openxmlformats-officedocument.presentationml.presentation");
> mimeMap.put("xls", "application/vnd.ms-excel");
> mimeMap.put("xlsx", 
> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
> mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
> mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
> mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
> mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
> mimeMap.put("txt", "text/plain");
> mimeMap.put("log", "text/plain");
>
> The keys are the types supported.
>
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar 
> wrote:
>
>> Ah!
>>
>> Only supported type is: text/html; encoding=utf-8
>>
>> I am not confident of this either :) but this should work.
>>
>> See the code-snippet below:
>>
>> ..
>>
>> if(res.httpStatus == 200) {
>>   // Raw content type of form "text/html; encoding=utf-8"
>>   String rawContentType = conn.getContentType();
>>   String type = rawContentType.split(";")[0];
>>   if(typeSupported(type) || "*".equals(fileTypes)) {
>> String encoding = conn.getContentEncoding();
>>
>> 
>>
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer  wrote:
>>
>>> Amrit Sarkar wrote:
>>>
>>> >> Strange,
>>> >>
>>> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org
>>> page's
>>> >> Content-Type. Let's see what it says now.
>>>
>>> Same thing.  Verified Content-Type:
>>>
>>> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |&
>>> grep Content-Type
>>>   Content-Type: text/html;charset=utf-8
>>> quadra[git:master]$ ]
>>>
>>> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c
>>> handbook http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes
>>> md
>>> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
>>> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddat

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Kevin,

Just put "html" too and give it a shot. These are the types it is expecting:

mimeMap = new HashMap<>();
mimeMap.put("xml", "application/xml");
mimeMap.put("csv", "text/csv");
mimeMap.put("json", "application/json");
mimeMap.put("jsonl", "application/json");
mimeMap.put("pdf", "application/pdf");
mimeMap.put("rtf", "text/rtf");
mimeMap.put("html", "text/html");
mimeMap.put("htm", "text/html");
mimeMap.put("doc", "application/msword");
mimeMap.put("docx",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document");
mimeMap.put("ppt", "application/vnd.ms-powerpoint");
mimeMap.put("pptx",
"application/vnd.openxmlformats-officedocument.presentationml.presentation");
mimeMap.put("xls", "application/vnd.ms-excel");
mimeMap.put("xlsx",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
mimeMap.put("txt", "text/plain");
mimeMap.put("log", "text/plain");

The keys are the types supported.


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar 
wrote:

> Ah!
>
> Only supported type is: text/html; encoding=utf-8
>
> I am not confident of this either :) but this should work.
>
> See the code-snippet below:
>
> ..
>
> if(res.httpStatus == 200) {
>   // Raw content type of form "text/html; encoding=utf-8"
>   String rawContentType = conn.getContentType();
>   String type = rawContentType.split(";")[0];
>   if(typeSupported(type) || "*".equals(fileTypes)) {
> String encoding = conn.getContentEncoding();
>
> 
>
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer  wrote:
>
>> Amrit Sarkar wrote:
>>
>> >> Strange,
>> >>
>> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
>> >> Content-Type. Let's see what it says now.
>>
>> Same thing.  Verified Content-Type:
>>
>> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |&
>> grep Content-Type
>>   Content-Type: text/html;charset=utf-8
>> quadra[git:master]$ ]
>>
>> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook
>> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
>> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
>> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
>> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> SimplePostTool version 5.0.0
>> Posting web pages to Solr url http://localhost:8983/solr/han
>> dbook/update/extract
>> Entering auto mode. Indexing pages with content-types corresponding to
>> file endings md
>> SimplePostTool: WARNING: Never crawl an external web site faster than
>> every 10 seconds, your IP will probably be blocked
>> Entering recursive mode, depth=10, delay=0s
>> Entering crawl at level 0 (1 links total, 1 new)
>> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a
>> HTTP result status of 415
>> 0 web pages indexed.
>> COMMITting Solr index changes to http://localhost:8983/solr/han
>> dbook/update/extract...
>> Time spent: 0:00:00.531
>> quadra[git:master]$
>>
>> Kevin
>>
>> >>
>> >> Amrit Sarkar
>> >> Search Engineer
>> >> Lucidworks, Inc.
>> >> 415-589-9269
>> >> www.lucidworks.com
>> >> Twitter http://twitter.com/lucidworks
>> >> LinkedIn: https://www.linkedin.com/in/sa

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Ah!

Only supported type is: text/html; encoding=utf-8

I am not confident of this either :) but this should work.

See the code-snippet below:

..

if(res.httpStatus == 200) {
  // Raw content type of form "text/html; encoding=utf-8"
  String rawContentType = conn.getContentType();
  String type = rawContentType.split(";")[0];
  if(typeSupported(type) || "*".equals(fileTypes)) {
String encoding = conn.getContentEncoding();




Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer  wrote:

> Amrit Sarkar wrote:
>
> >> Strange,
> >>
> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
> >> Content-Type. Let's see what it says now.
>
> Same thing.  Verified Content-Type:
>
> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |&
> grep Content-Type
>   Content-Type: text/html;charset=utf-8
> quadra[git:master]$ ]
>
> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook
> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
> SimplePostTool version 5.0.0
> Posting web pages to Solr url http://localhost:8983/solr/
> handbook/update/extract
> Entering auto mode. Indexing pages with content-types corresponding to
> file endings md
> SimplePostTool: WARNING: Never crawl an external web site faster than
> every 10 seconds, your IP will probably be blocked
> Entering recursive mode, depth=10, delay=0s
> Entering crawl at level 0 (1 links total, 1 new)
> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
> SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a
> HTTP result status of 415
> 0 web pages indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/
> handbook/update/extract...
> Time spent: 0:00:00.531
> quadra[git:master]$
>
> Kevin
>
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>
> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer  wrote:
> >>
> >> > OK, so I hacked markserv to add Content-Type text/html, but now I get
> >> >
> >> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html
> >> >
> >> > What is it expecting?
> >> >
> >> > $ docker exec -it --user=solr solr bin/post -c handbook
> >> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
> >> > /docker-java-home/jre/bin/java -classpath
> /opt/solr/dist/solr-core-7.0.1.jar
> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
> -Ddata=web
> >> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
> >> > SimplePostTool version 5.0.0
> >> > Posting web pages to Solr url http://localhost:8983/solr/
> >> > handbook/update/extract
> >> > Entering auto mode. Indexing pages with content-types corresponding to
> >> > file endings md
> >> > SimplePostTool: WARNING: Never crawl an external web site faster than
> >> > every 10 seconds, your IP will probably be blocked
> >> > Entering recursive mode, depth=10, delay=0s
> >> > Entering crawl at level 0 (1 links total, 1 new)
> >> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html
> >> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md
> returned a
> >> > HTTP result status of 415
> >> > 0 web pages indexed.
> >> > COMMITting Solr index changes to http://localhost:8983/solr/
> >> > handbook/update/extract...
> >> > Time spent: 0:00:03.882
> >> > $
> >> >
> >> > Thanks.
> >> >
> >> > Kevin
> >> >
>


Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Hi Kevin,

Can you post the solr log in the mail thread. I don't think it handled the
.md by itself by first glance at code.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer  wrote:

> Amrit Sarkar wrote:
>
> >> Kevin,
> >>
> >> Just put "html" too and give it a shot. These are the types it is
> expecting:
>
> Same thing.
>
> >>
> >> mimeMap = new HashMap<>();
> >> mimeMap.put("xml", "application/xml");
> >> mimeMap.put("csv", "text/csv");
> >> mimeMap.put("json", "application/json");
> >> mimeMap.put("jsonl", "application/json");
> >> mimeMap.put("pdf", "application/pdf");
> >> mimeMap.put("rtf", "text/rtf");
> >> mimeMap.put("html", "text/html");
> >> mimeMap.put("htm", "text/html");
> >> mimeMap.put("doc", "application/msword");
> >> mimeMap.put("docx",
> >> "application/vnd.openxmlformats-officedocument.
> wordprocessingml.document");
> >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
> >> mimeMap.put("pptx",
> >> "application/vnd.openxmlformats-officedocument.
> presentationml.presentation");
> >> mimeMap.put("xls", "application/vnd.ms-excel");
> >> mimeMap.put("xlsx",
> >> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
> >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
> >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
> >> mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
> >> mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
> >> mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
> >> mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
> >> mimeMap.put("txt", "text/plain");
> >> mimeMap.put("log", "text/plain");
> >>
> >> The keys are the types supported.
> >>
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>
> >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar 
> >> wrote:
> >>
> >> > Ah!
> >> >
> >> > Only supported type is: text/html; encoding=utf-8
> >> >
> >> > I am not confident of this either :) but this should work.
> >> >
> >> > See the code-snippet below:
> >> >
> >> > ..
> >> >
> >> > if(res.httpStatus == 200) {
> >> >   // Raw content type of form "text/html; encoding=utf-8"
> >> >   String rawContentType = conn.getContentType();
> >> >   String type = rawContentType.split(";")[0];
> >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
> >> > String encoding = conn.getContentEncoding();
> >> >
> >> > 
> >> >
> >> >
> >> > Amrit Sarkar
> >> > Search Engineer
> >> > Lucidworks, Inc.
> >> > 415-589-9269
> >> > www.lucidworks.com
> >> > Twitter http://twitter.com/lucidworks
> >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> >
> >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer  wrote:
> >> >
> >> >> Amrit Sarkar wrote:
> >> >>
> >> >> >> Strange,
> >> >> >>
> >> >> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org
> page's
> >> >> >> Content-Type. Let's see what it says now.
> >> >>
> >> >> Same thing.  Verified Content-Type:
> >> >>
> >> >> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md
> |&
> >> >> grep Content-Type
> >> >>   Content-Type: text/html;charset=utf-8
> >

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in
the machine. I haven't played much with docker, any way you can get that
file from that location.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer  wrote:

> Amrit Sarkar wrote:
>
> >> Hi Kevin,
> >>
> >> Can you post the solr log in the mail thread. I don't think it handled
> the
> >> .md by itself by first glance at code.
>
> How do I extract the log you want?
>
>
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>
> >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer  wrote:
> >>
> >> > Amrit Sarkar wrote:
> >> >
> >> > >> Kevin,
> >> > >>
> >> > >> Just put "html" too and give it a shot. These are the types it is
> >> > expecting:
> >> >
> >> > Same thing.
> >> >
> >> > >>
> >> > >> mimeMap = new HashMap<>();
> >> > >> mimeMap.put("xml", "application/xml");
> >> > >> mimeMap.put("csv", "text/csv");
> >> > >> mimeMap.put("json", "application/json");
> >> > >> mimeMap.put("jsonl", "application/json");
> >> > >> mimeMap.put("pdf", "application/pdf");
> >> > >> mimeMap.put("rtf", "text/rtf");
> >> > >> mimeMap.put("html", "text/html");
> >> > >> mimeMap.put("htm", "text/html");
> >> > >> mimeMap.put("doc", "application/msword");
> >> > >> mimeMap.put("docx",
> >> > >> "application/vnd.openxmlformats-officedocument.
> >> > wordprocessingml.document");
> >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
> >> > >> mimeMap.put("pptx",
> >> > >> "application/vnd.openxmlformats-officedocument.
> >> > presentationml.presentation");
> >> > >> mimeMap.put("xls", "application/vnd.ms-excel");
> >> > >> mimeMap.put("xlsx",
> >> > >> "application/vnd.openxmlformats-officedocument.
> spreadsheetml.sheet");
> >> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
> >> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
> >> > >> mimeMap.put("odp", "application/vnd.oasis.
> opendocument.presentation");
> >> > >> mimeMap.put("otp", "application/vnd.oasis.
> opendocument.presentation");
> >> > >> mimeMap.put("ods", "application/vnd.oasis.
> opendocument.spreadsheet");
> >> > >> mimeMap.put("ots", "application/vnd.oasis.
> opendocument.spreadsheet");
> >> > >> mimeMap.put("txt", "text/plain");
> >> > >> mimeMap.put("log", "text/plain");
> >> > >>
> >> > >> The keys are the types supported.
> >> > >>
> >> > >>
> >> > >> Amrit Sarkar
> >> > >> Search Engineer
> >> > >> Lucidworks, Inc.
> >> > >> 415-589-9269
> >> > >> www.lucidworks.com
> >> > >> Twitter http://twitter.com/lucidworks
> >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >>
> >> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <
> sarkaramr...@gmail.com>
> >> > >> wrote:
> >> > >>
> >> > >> > Ah!
> >> > >> >
> >> > >> > Only supported type is: text/html; encoding=utf-8
> >> > >> >
> >> > >> > I am not confident of this either :) but this should work.
> >> > >> >
> >> > >> > See the code-snippet below:
&

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
pardon: [solr-home]/server/log/solr.log

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 8:10 PM, Amrit Sarkar 
wrote:

> ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in
> the machine. I haven't played much with docker, any way you can get that
> file from that location.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer  wrote:
>
>> Amrit Sarkar wrote:
>>
>> >> Hi Kevin,
>> >>
>> >> Can you post the solr log in the mail thread. I don't think it handled
>> the
>> >> .md by itself by first glance at code.
>>
>> How do I extract the log you want?
>>
>>
>> >>
>> >> Amrit Sarkar
>> >> Search Engineer
>> >> Lucidworks, Inc.
>> >> 415-589-9269
>> >> www.lucidworks.com
>> >> Twitter http://twitter.com/lucidworks
>> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >>
>> >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer  wrote:
>> >>
>> >> > Amrit Sarkar wrote:
>> >> >
>> >> > >> Kevin,
>> >> > >>
>> >> > >> Just put "html" too and give it a shot. These are the types it is
>> >> > expecting:
>> >> >
>> >> > Same thing.
>> >> >
>> >> > >>
>> >> > >> mimeMap = new HashMap<>();
>> >> > >> mimeMap.put("xml", "application/xml");
>> >> > >> mimeMap.put("csv", "text/csv");
>> >> > >> mimeMap.put("json", "application/json");
>> >> > >> mimeMap.put("jsonl", "application/json");
>> >> > >> mimeMap.put("pdf", "application/pdf");
>> >> > >> mimeMap.put("rtf", "text/rtf");
>> >> > >> mimeMap.put("html", "text/html");
>> >> > >> mimeMap.put("htm", "text/html");
>> >> > >> mimeMap.put("doc", "application/msword");
>> >> > >> mimeMap.put("docx",
>> >> > >> "application/vnd.openxmlformats-officedocument.
>> >> > wordprocessingml.document");
>> >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> >> > >> mimeMap.put("pptx",
>> >> > >> "application/vnd.openxmlformats-officedocument.
>> >> > presentationml.presentation");
>> >> > >> mimeMap.put("xls", "application/vnd.ms-excel");
>> >> > >> mimeMap.put("xlsx",
>> >> > >> "application/vnd.openxmlformats-officedocument.spreadsheetml
>> .sheet");
>> >> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
>> >> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
>> >> > >> mimeMap.put("odp", "application/vnd.oasis.opendoc
>> ument.presentation");
>> >> > >> mimeMap.put("otp", "application/vnd.oasis.opendoc
>> ument.presentation");
>> >> > >> mimeMap.put("ods", "application/vnd.oasis.opendoc
>> ument.spreadsheet");
>> >> > >> mimeMap.put("ots", "application/vnd.oasis.opendoc
>> ument.spreadsheet");
>> >> > >> mimeMap.put("txt", "text/plain");
>> >> > >> mimeMap.put("log", "text/plain");
>> >> > >>
>> >> > >> The keys are the types supported.
>> >> > >>
>> >> > >>
>> >> > >> Amrit Sarkar
>> >> > >> Search Engineer
>> >> > >> Lucidworks, Inc.
>> >> > >> 415-589-9269
>> >> > >> www.lucidworks.com
>> >> > >> Twitter http://twitter.com/lucidworks
>> >> > >> LinkedIn: https://www.link

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Kevin,

I am not able to replicate the issue on my system, which is bit annoying
for me. Try this out for last time:

docker exec -it --user=solr solr bin/post -c handbook
http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html

and have Content-Type: "html" and "text/html", try with both.

If you get past this hurdle this hurdle, let me know.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 8:22 PM, Kevin Layer  wrote:

> Amrit Sarkar wrote:
>
> >> ah oh, dockers. They are placed under [solr-home]/server/log/solr/log
> in
> >> the machine. I haven't played much with docker, any way you can get that
> >> file from that location.
>
> I see these files:
>
> /opt/solr/server/logs/archived
> /opt/solr/server/logs/solr_gc.log.0.current
> /opt/solr/server/logs/solr.log
> /opt/solr/server/solr/handbook/data/tlog
>
> The 3rd one has very little info.  Attached:
>
>
> 2017-10-11 15:28:09.564 INFO  (main) [   ] o.e.j.s.Server
> jetty-9.3.14.v20161028
> 2017-10-11 15:28:10.668 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> ___  _   Welcome to Apache Solr™ version 7.0.1
> 2017-10-11 15:28:10.669 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter /
> __| ___| |_ _   Starting in standalone mode on port 8983
> 2017-10-11 15:28:10.670 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter \__
> \/ _ \ | '_|  Install dir: /opt/solr, Default config dir:
> /opt/solr/server/solr/configsets/_default/conf
> 2017-10-11 15:28:10.707 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> |___/\___/_|_|Start time: 2017-10-11T15:28:10.674Z
> 2017-10-11 15:28:10.747 INFO  (main) [   ] o.a.s.c.SolrResourceLoader
> Using system property solr.solr.home: /opt/solr/server/solr
> 2017-10-11 15:28:10.763 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading
> container configuration from /opt/solr/server/solr/solr.xml
> 2017-10-11 15:28:11.062 INFO  (main) [   ] o.a.s.c.SolrResourceLoader
> [null] Added 0 libs to classloader, from paths: []
> 2017-10-11 15:28:12.514 INFO  (main) [   ] o.a.s.c.CorePropertiesLocator
> Found 0 core definitions underneath /opt/solr/server/solr
> 2017-10-11 15:28:12.635 INFO  (main) [   ] o.e.j.s.Server Started @4304ms
> 2017-10-11 15:29:00.971 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json} status=0 QTime=108
> 2017-10-11 15:29:01.080 INFO  (qtp1911006827-18) [   ] 
> o.a.s.c.TransientSolrCoreCacheDefault
> Allocating transient cache for 2147483647 transient cores
> 2017-10-11 15:29:01.083 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={core=handbook&action=STATUS&wt=json} status=0 QTime=5
> 2017-10-11 15:29:01.194 INFO  (qtp1911006827-19) [   ]
> o.a.s.h.a.CoreAdminOperation core create command
> name=handbook&action=CREATE&instanceDir=handbook&wt=json
> 2017-10-11 15:29:01.342 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.c.SolrResourceLoader [handbook] Added 51 libs to classloader, from
> paths: [/opt/solr/contrib/clustering/lib, /opt/solr/contrib/extraction/lib,
> /opt/solr/contrib/langid/lib, /opt/solr/contrib/velocity/lib,
> /opt/solr/dist]
> 2017-10-11 15:29:01.504 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.c.SolrConfig Using Lucene MatchVersion: 7.0.1
> 2017-10-11 15:29:01.969 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.s.IndexSchema [handbook] Schema name=default-config
> 2017-10-11 15:29:03.678 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.s.IndexSchema Loaded schema default-config/1.6 with uniqueid field id
> 2017-10-11 15:29:03.806 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.c.CoreContainer Creating SolrCore 'handbook' using configuration from
> instancedir /opt/solr/server/solr/handbook, trusted=true
> 2017-10-11 15:29:03.853 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.c.SolrCore solr.RecoveryStrategy.Builder
> 2017-10-11 15:29:03.866 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.c.SolrCore [[handbook] ] Opening new SolrCore at
> [/opt/solr/server/solr/handbook], dataDir=[/opt/solr/server/
> solr/handbook/data/]
> 2017-10-11 15:29:04.180 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.r.XSLTResponseWriter xsltCacheLifetimeSeconds=5
> 2017-10-11 15:29:05.100 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.u.UpdateHandler Using UpdateLog implementation:
> org.apache.solr.update.UpdateLog
> 2017-10-11 15:29:05.101 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.u.UpdateLog Initializing UpdateLog: dataDir= defaultSyncLevel=FLUSH
> numRecordsToKeep=100 maxNumLogsToKeep=10 numVersionBucket

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Amrit Sarkar
Kevin,

fileType => md is not recognizable format in SimplePostTool, anyway, moving
on.

The above is SAXParse, runtime exception. Nothing can be done at Solr end
except curating your own data.
Some helpful links:
https://stackoverflow.com/questions/2599919/java-parsing-xml-document-gives-content-not-allowed-in-prolog-error
https://stackoverflow.com/questions/3030903/content-is-not-allowed-in-prolog-when-parsing-perfectly-valid-xml-on-gae

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 8:48 PM, Kevin Layer  wrote:

> Amrit Sarkar wrote:
>
> >> Kevin,
> >>
> >> I am not able to replicate the issue on my system, which is bit annoying
> >> for me. Try this out for last time:
> >>
> >> docker exec -it --user=solr solr bin/post -c handbook
> >> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0
> -filetypes html
> >>
> >> and have Content-Type: "html" and "text/html", try with both.
>
> With text/html I get and your command I get
>
> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook
> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes
> html
> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=html -Dc=handbook
> -Ddata=web org.apache.solr.util.SimplePostTool
> http://quadra.franz.com:9091/index.md
> SimplePostTool version 5.0.0
> Posting web pages to Solr url http://localhost:8983/solr/
> handbook/update/extract
> Entering auto mode. Indexing pages with content-types corresponding to
> file endings html
> SimplePostTool: WARNING: Never crawl an external web site faster than
> every 10 seconds, your IP will probably be blocked
> Entering recursive mode, depth=10, delay=0s
> Entering crawl at level 0 (1 links total, 1 new)
> POSTed web resource http://quadra.franz.com:9091/index.md (depth: 0)
> [Fatal Error] :1:1: Content is not allowed in prolog.
> Exception in thread "main" java.lang.RuntimeException:
> org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is
> not allowed in prolog.
> at org.apache.solr.util.SimplePostTool$PageFetcher.
> getLinksFromWebPage(SimplePostTool.java:1252)
> at org.apache.solr.util.SimplePostTool.webCrawl(
> SimplePostTool.java:616)
> at org.apache.solr.util.SimplePostTool.postWebPages(
> SimplePostTool.java:563)
> at org.apache.solr.util.SimplePostTool.doWebMode(
> SimplePostTool.java:365)
> at org.apache.solr.util.SimplePostTool.execute(
> SimplePostTool.java:187)
> at org.apache.solr.util.SimplePostTool.main(
> SimplePostTool.java:172)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1;
> Content is not allowed in prolog.
> at com.sun.org.apache.xerces.internal.parsers.DOMParser.
> parse(DOMParser.java:257)
> at com.sun.org.apache.xerces.internal.jaxp.
> DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
> at javax.xml.parsers.DocumentBuilder.parse(
> DocumentBuilder.java:121)
> at org.apache.solr.util.SimplePostTool.makeDom(
> SimplePostTool.java:1061)
> at org.apache.solr.util.SimplePostTool$PageFetcher.
> getLinksFromWebPage(SimplePostTool.java:1232)
> ... 5 more
>
>
> When I use "-filetype md" back to the regular output that doesn't scan
> anything.
>
>
> >>
> >> If you get past this hurdle this hurdle, let me know.
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>
> >> On Fri, Oct 13, 2017 at 8:22 PM, Kevin Layer  wrote:
> >>
> >> > Amrit Sarkar wrote:
> >> >
> >> > >> ah oh, dockers. They are placed under [solr-home]/server/log/solr/
> log
> >> > in
> >> > >> the machine. I haven't played much with docker, any way you can
> get that
> >> > >> file from that location.
> >> >
> >> > I see these files:
> >> >
> >> > /opt/solr/server/logs/archived
> >> > /opt/solr/server/logs/solr_gc.log.0.current
> >> > /opt/solr/server/logs/solr.log
> >> > /opt/solr/server/solr/handbook/data/tlog
> >> >
> >> > The 3rd one has very little info.  Attached:
> >> >
> >> >
>

Re: HOW DO I UNSUBSCRIBE FROM GROUP?

2017-10-16 Thread Amrit Sarkar
Hi,

If you wish the emails to "stop", kindly "UNSUBSCRIBE"  by following the
instructions on the http://lucene.apache.org/solr/community.html. Hope this
helps.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, Oct 16, 2017 at 9:56 AM,  wrote:

>
> Hi,
>
> Just wondering how do I 'unsubscribe' from the emails I'm receiving from
> the
> group?
>
> I'm getting way more emails than I need right now and would like them to
> 'stop'... But there is NO UNSUBSCRIBE link in any of the emails.
>
> Thanks,
> Rita
>
> -Original Message-
> From: Reth RM [mailto:reth.ik...@gmail.com]
> Sent: Sunday, October 15, 2017 10:57 PM
> To: solr-user@lucene.apache.org
> Subject: Efficient query to obtain DF
>
> Dear Solr-User Group,
>
>Can you please suggest efficient query for retrieving term to document
> frequency(df) of that term at shard index level?
>
> I know we can get term to df mapping by applying termVectors component
> <https://lucene.apache.org/solr/guide/6_6/the-term-
> vector-component.html#The
> TermVectorComponent-RequestParameters>,
> however, results returned by this component are each doc to term and its
> df. I was looking for straight forward flat list of terms-df mapping,
> similar to how terms component returns term-tf (term frequency) map list.
>
> Thank you.
>
>


Re: Howto verify that update is "in-place"

2017-10-17 Thread Amrit Sarkar
Hi James,

As for each update you are doing via atomic operation contains the "id" /
"uniqueKey". Comparing the "_version_" field value for one of them would be
fine for a batch. Rest, Emir has list them out.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Oct 17, 2017 at 2:47 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi James,
> I did not try, but checking max and num doc might give you info if update
> was in-place or atomic - atomic is reindexing of existing doc so the old
> doc will be deleted. In-place update should just update doc values of
> existing doc so number of deleted docs should not change.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 17 Oct 2017, at 09:57, James  wrote:
> >
> > I am using Solr 6.6 and carefully read the documentation about atomic and
> > in-place updates. I am pretty sure that everything is set up as it
> should.
> >
> >
> >
> > But how can I make certain that a simple update command actually
> performs an
> > in-place update without internally re-indexing all other fields?
> >
> >
> >
> > I am issuing this command to my server:
> >
> > (I am using implicit document routing, so I need the "Shard" parameter.)
> >
> >
> >
> > {
> >
> > "ID":1133,
> >
> > "Property_2":{"set":124},
> >
> > "Shard":"FirstShard"
> >
> > }
> >
> >
> >
> >
> >
> > The log outputs:
> >
> >
> >
> > 2017-10-17 07:39:18.701 INFO  (qtp1937348256-643) [c:MyCollection
> > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > o.a.s.u.p.LogUpdateProcessorFactory [MyCollection_FirstShard_replica1]
> > webapp=/solr path=/update
> > params={commitWithin=1000&boost=1.0&overwrite=true&wt=
> json&_=1508221142230}{
> > add=[1133 (1581489542869811200)]} 0 1
> >
> > 2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1)
> [c:MyCollection
> > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > o.a.s.u.DirectUpdateHandler2 start
> > commit{,optimize=false,openSearcher=false,waitSearcher=true,
> expungeDeletes=f
> > alse,softCommit=true,prepareCommit=false}
> >
> > 2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1)
> [c:MyCollection
> > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > o.a.s.s.SolrIndexSearcher Opening
> > [Searcher@32d539b4[MyCollection_FirstShard_replica1] main]
> >
> > 2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1)
> [c:MyCollection
> > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > o.a.s.u.DirectUpdateHandler2 end_commit_flush
> >
> > 2017-10-17 07:39:19.703 INFO
> > (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_solr
> > x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection
> > r:core_node27) [c:MyCollection s:FirstShard r:core_node27
> > x:MyCollection_FirstShard_replica1] o.a.s.c.QuerySenderListener
> > QuerySenderListener sending requests to
> > Searcher@32d539b4[MyCollection_FirstShard_replica1]
> > main{ExitableDirectoryReader(UninvertingDirectoryReader(
> Uninverting(_i(6.6.0
> > ):C5011/1) Uninverting(_j(6.6.0):C478) Uninverting(_k(6.6.0):C345)
> > Uninverting(_l(6.6.0):C4182) Uninverting(_m(6.6.0):C317)
> > Uninverting(_n(6.6.0):C399) Uninverting(_q(6.6.0):C1)))}
> >
> > 2017-10-17 07:39:19.703 INFO
> > (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_solr
> > x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection
> > r:core_node27) [c:MyCollection s:FirstShard r:core_node27
> > x:MyCollection_FirstShard_replica1] o.a.s.c.QuerySenderListener
> > QuerySenderListener done.
> >
> > 2017-10-17 07:39:19.703 INFO
> > (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_solr
> > x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection
> > r:core_node27) [c:MyCollection s:FirstShard r:core_node27
> > x:MyCollection_FirstShard_replica1] o.a.s.c.SolrCore
> > [MyCollection_FirstShard_replica1] Registered new searcher
> > Searcher@32d539b4[MyCollection_FirstShard_replica1]
> > main{ExitableDirectoryReader(UninvertingDirectoryReader(
> Uninverting(_i(6.6.0
> > ):C5011/1) Uninverting(_j(6.6.0):C478) Uninverting(_k(6.6.0):C345)
> > Uninverting(_l(6.6.0):C4182) Uninverting(_m(6.6.0):C317)
> > Uninverting(_n(6.6.0):C399) Uninverting(_q(6.6.0):C1)))}
> >
> >
> >
> > If I issue another, non-in-place update to another field which is not a
> > DocValue, the log output is very similar. Can I increase verbosity? Will
> it
> > tell me more about the type of update then?
> >
> >
> >
> > Thank you!
> >
> > James
> >
> >
> >
> >
> >
> >
> >
>
>


Re: Using pint field as uniqueKey

2017-10-17 Thread Amrit Sarkar
By looking into the code,

if (uniqueKeyField.getType().isPointField()) {
  String msg = UNIQUE_KEY + " field ("+uniqueKeyFieldName+
") can not be configured to use a Points based FieldType: " +
uniqueKeyField.getType().getTypeName();
  log.error(msg);
  throw new SolrException(ErrorCode.SERVER_ERROR, msg);
}

Not sure the reason behind; someone else can weigh in here, but PointFields
are not allowed to be unique keys, probably because how they are structures
and stored on disk.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Oct 17, 2017 at 1:49 PM, Michael Kondratiev <
kondratiev.mich...@gmail.com> wrote:

> I'm trying to set up uniqueKey ( what is integer)  like that:
>
>
>  required="true" multiValued="false"/>
> id
>
>
> But when I upload configuration into solr i see following error:
>
>
>
> uniqueKey field (id) can not be configured to use a Points based
> FieldType: pint
>
> If i set type=“string” everything seems to be ok.


Re: Howto verify that update is "in-place"

2017-10-17 Thread Amrit Sarkar
James,

@Amrit: Are you saying that the _version_ field should not change when
> performing an atomic update operation?


It should change. a new version will be allotted to the document. I am not
that sure about in-place updates, probably a test run will verify that.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Oct 17, 2017 at 4:06 PM, James  wrote:

> Hi Emir and Amrit, thanks for your reponses!
>
> @Emir: Nice idea but after changing any document in any way and after
> committing the changes, all Doc counter (Num, Max, Deleted) are still the
> same, only thing that changes is the Version (increases by steps of 2) .
>
> @Amrit: Are you saying that the _version_ field should not change when
> performing an atomic update operation?
>
> Thanks
> James
>
>
> -----Ursprüngliche Nachricht-
> Von: Amrit Sarkar [mailto:sarkaramr...@gmail.com]
> Gesendet: Dienstag, 17. Oktober 2017 11:35
> An: solr-user@lucene.apache.org
> Betreff: Re: Howto verify that update is "in-place"
>
> Hi James,
>
> As for each update you are doing via atomic operation contains the "id" /
> "uniqueKey". Comparing the "_version_" field value for one of them would be
> fine for a batch. Rest, Emir has list them out.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Tue, Oct 17, 2017 at 2:47 PM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
> > Hi James,
> > I did not try, but checking max and num doc might give you info if
> > update was in-place or atomic - atomic is reindexing of existing doc
> > so the old doc will be deleted. In-place update should just update doc
> > values of existing doc so number of deleted docs should not change.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection Solr &
> > Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> > > On 17 Oct 2017, at 09:57, James  wrote:
> > >
> > > I am using Solr 6.6 and carefully read the documentation about
> > > atomic and in-place updates. I am pretty sure that everything is set
> > > up as it
> > should.
> > >
> > >
> > >
> > > But how can I make certain that a simple update command actually
> > performs an
> > > in-place update without internally re-indexing all other fields?
> > >
> > >
> > >
> > > I am issuing this command to my server:
> > >
> > > (I am using implicit document routing, so I need the "Shard"
> > > parameter.)
> > >
> > >
> > >
> > > {
> > >
> > > "ID":1133,
> > >
> > > "Property_2":{"set":124},
> > >
> > > "Shard":"FirstShard"
> > >
> > > }
> > >
> > >
> > >
> > >
> > >
> > > The log outputs:
> > >
> > >
> > >
> > > 2017-10-17 07:39:18.701 INFO  (qtp1937348256-643) [c:MyCollection
> > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > > o.a.s.u.p.LogUpdateProcessorFactory
> > > [MyCollection_FirstShard_replica1]
> > > webapp=/solr path=/update
> > > params={commitWithin=1000&boost=1.0&overwrite=true&wt=
> > json&_=1508221142230}{
> > > add=[1133 (1581489542869811200)]} 0 1
> > >
> > > 2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1)
> > [c:MyCollection
> > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > > o.a.s.u.DirectUpdateHandler2 start
> > > commit{,optimize=false,openSearcher=false,waitSearcher=true,
> > expungeDeletes=f
> > > alse,softCommit=true,prepareCommit=false}
> > >
> > > 2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1)
> > [c:MyCollection
> > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > > o.a.s.s.SolrIndexSearcher Opening
> > > [Searcher@32d539b4[MyCollection_FirstShard_replica1] main]
> > >
> > > 2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1)
> > [c:MyCollection
> > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > > o.a.s.u.DirectUpdateHandler2 end_commit_flush
> &g

Re: solr 7.0: What causes the segment to flush

2017-10-17 Thread Amrit Sarkar
>
> In 7.0, i am finding that the file is written to disk very early on
> and it is being updated every second or so. Had something changed in 7.0
> which is causing it?  I tried something similar with solr 6.5 and i was
> able to get almost a GB size files on disk.


Interesting observation, Nawab, with ramBufferSizeMB=20G, you are getting
20GB segments on 6.5 or less? a GB?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Oct 17, 2017 at 12:48 PM, Nawab Zada Asad Iqbal 
wrote:

> Hi,
>
> I have  tuned  (or tried to tune) my settings to only flush the segment
> when it has reached its maximum size. At the moment,I am using my
> application with only a couple of threads (i have limited to one thread for
> analyzing this scenario) and my ramBufferSizeMB=2 (i.e. ~20GB). With
> this, I assumed that my file sizes on the disk will be at in the order of
> GB; and no segments will be flushed until the segment's in memory size is
> 2GB. In 7.0, i am finding that the file is written to disk very early on
> and it is being updated every second or so. Had something changed in 7.0
> which is causing it?  I tried something similar with solr 6.5 and i was
> able to get almost a GB size files on disk.
>
> How can I control it to not write to disk until the segment has reached its
> maximum permitted size (1945 MB?) ? My write traffic is 'new only' (i.e.,
> it doesn't delete any document) , however I also found following infostream
> logs, which incorrectly say 'delete=true':
>
> Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-887) [   x:filesearch]
> o.a.s.c.S.Request [filesearch]  webapp=/solr path=/update
> params={commit=false} status=0 QTime=21
> Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-889) [   x:filesearch]
> o.a.s.u.LoggingInfoStream [DW][qtp761960786-889]: anyChanges?
> numDocsInRam=4434 deletes=true hasTickets:false pendingChangesInFullFlush:
> false
> Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-889) [   x:filesearch]
> o.a.s.u.LoggingInfoStream [IW][qtp761960786-889]: nrtIsCurrent: infoVersion
> matches: false; DW changes: true; BD changes: false
> Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-889) [   x:filesearch]
> o.a.s.c.S.Request [filesearch]  webapp=/solr path=/admin/luke
> params={show=index&numTerms=0&wt=json} status=0 QTime=0
>
>
>
> Thanks
> Nawab
>


Re: Using pint field as uniqueKey

2017-10-17 Thread Amrit Sarkar
https://issues.apache.org/jira/browse/SOLR-10829: IndexSchema should
enforce that uniqueKey field must not be points based

The description tells the real reason.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Oct 17, 2017 at 5:42 PM, alessandro.benedetti 
wrote:

> In addition to what Amrit correctly stated, if you need to search on your
> id,
> especially range queries, I recommend to use a copy field and leave the id
> field, almost as default.
>
> Cheers
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Merging is not taking place with tiered merge policy

2017-10-23 Thread Amrit Sarkar
Chandru,

Didn't try the above config bu whyt have you defined both "mergePolicy" and
"mergePolicyFactory"? and pass different values for same parameters?



>   10
>   1
>     
> 
>   10
>   10
> 
>


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, Oct 23, 2017 at 11:00 AM, Chandru Shanmugasundaram <
chandru.shanmugasunda...@exterro.com> wrote:

> The following is my solrconfig.xml
>
> 
> 1000
> 1
> 15
> false
> 1024
> 
>   10
>   1
> 
> 
>   10
>   10
> 
> hdfs
> 
> 1
> 0
> 
>   
>
> Please let me know if should I tweak something above
>
>
> --
> Thanks,
> Chandru.S
>


Solr require both hl.fl and df same for correct highlighting.

2017-10-26 Thread Amrit Sarkar
Solr version: 6.5.x

Why do we need to pass hl.fl and df to be same for correct highlighting?

Let us suppose I am highlighting on field: fieldA which has stemming filter
on its analysis.

Sample doc: {"id":"1", "fieldA":"Vacation"}

If I then highlighting request:
> "params":{
>   "q":"Vacation",
>   "hl":"on",
>   "indent":"on",
>   "hl.fl":"fieldA",
>   "wt":"json"}


Highlighting doesn't work as "Vacation" via _text_::text_general as
"Vacation" remains "Vacation", while on the index it is stored as "vacat".

I debugged through the code and HighlightComponent::169

highlightQuery = rb.getQparser().getHighlightQuery();


highlightQuery is passed which is analysed value of what's being passed,
this case: _text_:Vacation.

Fast-forwarding to WeightedSpanTermExtractor::extractWeightedTerms::366::

for (final Term queryTerm : nonWeightedTerms) {
>   if (fieldNameComparator(queryTerm.field())) {
> WeightedSpanTerm weightedSpanTerm = new WeightedSpanTerm(boost,
> queryTerm.text());
> terms.put(queryTerm.text(), weightedSpanTerm);
>   }
> }

extracted term is "Vacation".

Jumping to core highlighting code:

Highlighter::getBestTextFragements::213

TokenGroup tokenGroup=new TokenGroup(tokenStream);


Each tokenStream, has analysed tokens: "vacat" which obviously doesn't
match with extracted term.

Why the df, qf, values concern with what we pass in "hl.fl"? Isn't the
query which is to be highlighted be analysed by field passed in "hl.fl",
but then multiple fields can be passed in "hl.fl". Just wondering how it is
suppose to be done. Any explanation will be fine.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2


Re: Streaming Expression - cartesianProduct

2017-11-01 Thread Amrit Sarkar
Following Pratik's spot-on comment and not really related to your question,

Even the "partitionKeys" parameter needs to be specified the "over" field
while using "parallel" streaming.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Nov 2, 2017 at 2:38 AM, Pratik Patel  wrote:

> Roll up needs documents to be sorted by the "over" field.
> Check this for more details
> http://lucene.472066.n3.nabble.com/Streaming-Expressions-rollup-function-
> returning-results-with-duplicate-tuples-td4342398.html
>
> On Wed, Nov 1, 2017 at 3:41 PM, Kojo  wrote:
>
> > Wrap cartesianProduct function with fetch function works as expected.
> >
> > But rollup function over cartesianProduct doesn´t aggregate on a returned
> > field of the cartesianProduct.
> >
> >
> > The field "id_researcher" bellow is a Multivalued field:
> >
> >
> >
> > This one works:
> >
> >
> > fetch(reasercher,
> >
> > cartesianProduct(
> > having(
> > cartesianProduct(
> > search(schoolarship,zkHost="localhost:9983",qt="/export",
> > q="*:*",
> > fl="process, area, id_reasercher",sort="process asc"),
> > area
> > ),
> > eq(area, val(Anything))),
> > id_reasercher),
> > fl="name, django_id",
> > on="id_reasercher=django_id"
> > )
> >
> >
> > This one doesn´t works:
> >
> > rollup(
> >
> > cartesianProduct(
> > having(
> > cartesianProduct(
> > search(schoolarship,zkHost="localhost:9983",qt="/export",
> > q="*:*",
> > fl="process, area, id_researcher, status",sort="process asc"),
> > area
> > ),
> > eq(area, val(Anything))),
> > id_researcher),
> > over=id_researcher,count(*)
> > )
> >
> > If I aggregate over a non MultiValued field, it works.
> >
> >
> > Is that correct, rollup doesn´t work on a cartesianProduct?
> >
>


Re: SolrClould 6.6 stability challenges

2017-11-04 Thread Amrit Sarkar
Pretty much what Emir has stated. I want to know, when you saw;

all of this runs perfectly ok when indexing isn't happening. as soon as
> we start "nrt" indexing one of the follower nodes goes down within 10 to 20
> minutes.


When you say "NRT" indexing, what is the commit strategy in indexing. With
auto-commit so highly set, are you committing after batch, if yes, what's
the number.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Sat, Nov 4, 2017 at 2:47 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Rick,
> Do you see any errors in logs? Do you have any monitoring tool? Maybe you
> can check heap and GC metrics around time when incident happened. It is not
> large heap but some major GC could cause pause large enough to trigger some
> snowball and end up with node in recovery state.
> What is indexing rate you observe? Why do you have max warming searchers 5
> (did you mean this with autowarmingsearchers?) when you commit every 5 min?
> Why did you increase it - you seen errors with default 2? Maybe you commit
> every bulk?
> Do you see similar behaviour when you just do indexing without queries?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 4 Nov 2017, at 05:15, Rick Dig  wrote:
> >
> > hello all,
> > we are trying to run solrcloud 6.6 in a production setting.
> > here's our config and issue
> > 1) 3 nodes, 1 shard, replication factor 3
> > 2) all nodes are 16GB RAM, 4 core
> > 3) Our production load is about 2000 requests per minute
> > 4) index is fairly small, index size is around 400 MB with 300k documents
> > 5) autocommit is currently set to 5 minutes (even though ideally we would
> > like a smaller interval).
> > 6) the jvm runs with 8 gb Xms and Xmx with CMS gc.
> > 7) all of this runs perfectly ok when indexing isn't happening. as soon
> as
> > we start "nrt" indexing one of the follower nodes goes down within 10 to
> 20
> > minutes. from this point on the nodes never recover unless we stop
> > indexing.  the master usually is the last one to fall.
> > 8) there are maybe 5 to 7 processes indexing at the same time with
> document
> > batch sizes of 500.
> > 9) maxRambuffersizeMB is 100, autowarmingsearchers is 5,
> > 10) no cpu and / or oom issues that we can see.
> > 11) cpu load does go fairly high 15 to 20 at times.
> > any help or pointers appreciated
> >
> > thanks
> > rick
>
>


Re: Incorrect ngroup count

2017-11-07 Thread Amrit Sarkar
Zheng,

Usually, the number of records returned is more than what is shown in the
> ngroup. For example, I may get a ngroup of 22, but there are 25 records
> being returned.


Does the 25 records being returned have duplicates? Grouping is subjected
to co-location of data of same group values in same shard. Can you share
what is the architecture of the setup?


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Nov 7, 2017 at 8:36 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I'm using Solr 6.5.1, and I'm facing the issue of incorrect ngroup count
> after I have group it by signature field.
>
> Usually, the number of records returned is more than what is shown in the
> ngroup. For example, I may get a ngroup of 22, but there are 25 records
> being returned.
>
> Below is the part of solrconfig.xml that does the grouping.
>
>   "solr.processor.SignatureUpdateProcessorFactory">  name="enabled">true
>  signature  "overwriteDupes">false content  "signatureClass">solr.processor.Lookup3Signature  <
> processor class="solr.DistributedUpdateProcessorFactory" />  class
> ="solr.LogUpdateProcessorFactory" />  "solr.RunUpdateProcessorFactory" /> 
>
>
> This is where I set the grouping to true in the requestHandler
>
> true signature  name="group.main">true  <
> str name="group.cache.percent">100
>
> What could be the issue that causes this?
>
> Regards,
> Edwin
>


Re: Long blocking during indexing + deleteByQuery

2017-11-07 Thread Amrit Sarkar
Maybe not a relevant fact on this, but: "addAndDelete" is triggered by
"*Reordering
of DBQs'; *that means there are non-executed DBQs present in the updateLog
and an add operation is also received. Solr makes sure DBQs are executed
first and than add operation is executed.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Nov 7, 2017 at 9:19 PM, Erick Erickson 
wrote:

> Well, consider what happens here.
>
> Solr gets a DBQ that includes document 132 and 10,000,000 other docs
> Solr gets an add for document 132
>
> The DBQ takes time to execute. If it was processing the requests in
> parallel would 132 be in the index after the delete was over? It would
> depend on when the DBQ found the doc relative to the add.
> With this sequence one would expect 132 to be in the index at the end.
>
> And it's worse when it comes to distributed indexes. If the updates
> were sent out in parallel you could end up in situations where one
> replica contained 132 and another didn't depending on the vagaries of
> thread execution.
>
> Now I didn't write the DBQ code, but that's what I think is happening.
>
> Best,
> Erick
>
> On Tue, Nov 7, 2017 at 7:40 AM, Chris Troullis 
> wrote:
> > As an update, I have confirmed that it doesn't seem to have anything to
> do
> > with child documents, or standard deletes, just deleteByQuery. If I do a
> > deleteByQuery on any collection while also adding/updating in separate
> > threads I am experiencing this blocking behavior on the non-leader
> replica.
> >
> > Has anyone else experienced this/have any thoughts on what to try?
> >
> > On Sun, Nov 5, 2017 at 2:20 PM, Chris Troullis 
> wrote:
> >
> >> Hi,
> >>
> >> I am experiencing an issue where threads are blocking for an extremely
> >> long time when I am indexing while deleteByQuery is also running.
> >>
> >> Setup info:
> >> -Solr Cloud 6.6.0
> >> -Simple 2 Node, 1 Shard, 2 replica setup
> >> -~12 million docs in the collection in question
> >> -Nodes have 64 GB RAM, 8 CPUs, spinning disks
> >> -Soft commit interval 10 seconds, Hard commit (open searcher false) 60
> >> seconds
> >> -Default merge policy settings (Which I think is 10/10).
> >>
> >> We have a query heavy index heavyish use case. Indexing is constantly
> >> running throughout the day and can be bursty. The indexing process
> handles
> >> both updates and deletes, can spin up to 15 simultaneous threads, and
> sends
> >> to solr in batches of 3000 (seems to be the optimal number per trial and
> >> error).
> >>
> >> I can build the entire collection from scratch using this method in < 40
> >> mins and indexing is in general super fast (averages about 3 seconds to
> >> send a batch of 3000 docs to solr). The issue I am seeing is when some
> >> threads are adding/updating documents while other threads are issuing
> >> deletes (using deleteByQuery), solr seems to get into a state of extreme
> >> blocking on the replica, which results in some threads taking 30+
> minutes
> >> just to send 1 batch of 3000 docs. This collection does use child
> documents
> >> (hence the delete by query _root_), not sure if that makes a
> difference, I
> >> am trying to duplicate on a non-child doc collection. CPU/IO wait seems
> >> minimal on both nodes, so not sure what is causing the blocking.
> >>
> >> Here is part of the stack trace on one of the blocked threads on the
> >> replica:
> >>
> >> qtp592179046-576 (576)
> >> java.lang.Object@608fe9b5
> >> org.apache.solr.update.DirectUpdateHandler2.addAndDelete(
> >> DirectUpdateHandler2.java:354)
> >> org.apache.solr.update.DirectUpdateHandler2.addDoc0(
> >> DirectUpdateHandler2.java:237)
> >> org.apache.solr.update.DirectUpdateHandler2.addDoc(
> >> DirectUpdateHandler2.java:194)
> >> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(
> >> RunUpdateProcessorFactory.java:67)
> >> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(
> >> UpdateRequestProcessor.java:55)
> >> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(
> >> DistributedUpdateProcessor.java:979)
> >> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(
> >> DistributedUpdateProcessor

Re: Streaming Expression usage

2017-11-07 Thread Amrit Sarkar
Kojo,

Not sure what do you mean by making two request to get documents. A
"search" streaming expression can be passed with "fq" parameter to filter
the results and rollup on top of that will fetch you desired results. This
maybe not mentioned in official docs:

Sample streaming expression:

expr=rollup(
>
> search(collection1,
>
> zkHost="localhost:9983",
>
> qt="/export",
>
> q="*:*",
>
> fq=a_s:filter_a
>
>     fl="id,a_s,a_i,a_f",
>
> sort="a_f asc"),
>
>over=a_f)
>
>
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Nov 8, 2017 at 7:41 AM, Kojo  wrote:

> Hi,
> I am working on PoC of a front-end web to provide an interface to the end
> user search and filter data on Solr indexes.
>
> I am trying Streaming Expression for about a week and I am fairly keen
> about using it to search and filter indexes on Solr side. But I am not sure
> whether this is the right approach or not.
>
> A simple question to illustrate my doubts: If use the search and some
> Streaming Expressions more to get and filter the indexes to get documents,
> and I want to rollup the result, will I have to make two requests? Is this
> a good use for Streaming Expressions?
>


Re: How to routing document for send to particular shard range

2017-11-07 Thread Amrit Sarkar
Ketan,

If you know defined indexing architecture; isn't it better to use
"implicit" router by writing logic on your own end.

If the document is of "Org1", send the document with extra param*
"_route_:shard1"* and likewise.

Snippet from official doc:
https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
:

If you created the collection and defined the "implicit" router at the time
> of creation, you can additionally define a router.field parameter to use a
> field from each document to identify a shard where the document belongs. If
> the field specified is missing in the document, however, the document will
> be rejected. You could also use the _route_ parameter to name a specific
> shard.



Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki  wrote:

> Hi,
>
> I have requirement now quite different as I need to set routing key hash
> for document which confirm it to send to particular shard as its range.
>
> I have solrcloud configuration with 4 shard  & 4 replica with below shard
> range.
> shard1: 8000-bfff
> shard2: c000-
> shard3: 0-3fff
> shard4: 4000-7fff
>
> e.g: below show the project  works in organization which is my routing key.
> Org1= works for project1,project2
> Org2=works for project3
> Org3=works for project4
> Org4=project5
>
> So as mentions above I want to index org1 to shard1,org2 to shard2,org3 to
> shard3,org4 to shard4 meanwhile send it to particular shard.
> How could I manage compositeId routing to do this.
>
> Regards,
> Ketan.
> Please cast a vote for Asite in the 2017 Construction Computing Awards:
> Click here to Vote<http://caddealer.com/concompawards/index.php?page=
> cca2017vote>
>
> [CC Award Winners!]
>
>


Re: Atomic Updates with SolrJ

2017-11-09 Thread Amrit Sarkar
Hi Martin,

I tested the same application SolrJ code on my system, it worked just fine
on Solr 6.6.x. My Solrclient is "CloudSolrJClient", which I think doesn't
make any difference. Can you show the response and field declarations if
you are continuously facing the issue.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Nov 9, 2017 at 1:55 PM, Martin Keller <
martin.kel...@unitedplanet.com> wrote:

> Hello,
>
> I’m trying to Update a field in a document via SolrJ. Unfortunately, while
> the field itself is updated correctly, values of some other fields are
> removed.
> The code looks like this:
>
> SolrInputDocument updateDoc = new SolrInputDocument();
>
> updateDoc.addField("id", "1234");
>
> Map updateValue = new HashMap<>();
> updateValue.put("set", 1);
> updateDoc.addField("fieldToUpdate", updateValue);
>
> final UpdateRequest request;
>
> request = new UpdateRequest();
> request.add(updateDoc);
>
> request.process(solrClient, "myCollection");
> solrClient.commit();
>
>
> If I send a similar request with curl, e.g.
>
> curl -X POST -H 'Content-Type: application/json' '
> http://localhost:8983/solr/myCollection/update' --data-binary
> '[{"id":"1234", "fieldToUpdate":{"set":"1"}}]'
>
> it works as expected.
> I’m using Solr 6.0.1, but the problem also occurs in 6.6.0.
>
> Any ideas?
>
> Thanks
> Martin
>
>


Re: solr cloud updatehandler stats mismatch

2017-11-09 Thread Amrit Sarkar
Wei,

Are the requests coming through to collection has multiple shards and
replicas. Please mind a update request is received by a node, redirected to
particular shard the doc belong, and then distributed to replicas of the
collection. On each replica, each core, update request is played.

Can be a probable reason b/w mismatch between Mbeans stats and manual
counting in logs, as not everything gets logged. Need to check that once.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Nov 9, 2017 at 4:34 PM, Furkan KAMACI 
wrote:

> Hi Wei,
>
> Do you compare it with files which are under /var/solr/logs by default?
>
> Kind Regards,
> Furkan KAMACI
>
> On Sun, Nov 5, 2017 at 6:59 PM, Wei  wrote:
>
> > Hi,
> >
> > I use the following api to track the number of update requests:
> >
> > /solr/collection1/admin/mbeans?cat=UPDATE&stats=true&wt=json
> >
> >
> > Result:
> >
> >
> >- class: "org.apache.solr.handler.UpdateRequestHandler",
> >- version: "6.4.2.1",
> >- description: "Add documents using XML (with XSLT), CSV, JSON, or
> >javabin",
> >- src: null,
> >- stats:
> >{
> >   - handlerStart: 1509824945436,
> >   - requests: 106062,
> >   - ...
> >
> >
> > I am quite confused that the number of requests reported above is quite
> > different from the count from solr access logs. A few times the handler
> > stats is much higher: handler reports ~100k requests but in the access
> log
> > there are only 5k update requests. What could be the possible cause?
> >
> > Thanks,
> > Wei
> >
>


Re: Make search on the particular field to be case sensitive

2017-11-09 Thread Amrit Sarkar
Behavior of the field values is defined by fieldType analyzer declaration.

If you look at the managed-schema;

You will find fieldType declarations like:


>  
>  ignoreCase="true"/>   class="solr.EnglishPossessiveFilterFactory"/>  "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>  class="solr.PorterStemFilterFactory"/>  
>   "solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms=
> "synonyms.txt"/>  "lang/stopwords_en.txt" ignoreCase="true"/>  "solr.LowerCaseFilterFactory"/>  "solr.EnglishPossessiveFilterFactory"/>  "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>  class="solr.PorterStemFilterFactory"/>  


In you case fieldType is "string". *You need to write analyzer chain for
the same fieldType and don't include:*
 

LowerCaseFilterFactory is responsible lowercase the token coming in query
and while indexing.

Something like this will work for you:


   

I listed "KeywordTokenizerFactory" considering this is string, not text.

More details on: https://lucene.apache.org/solr/guide/6_6/analyzers.html

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Nov 9, 2017 at 4:41 PM, Karan Saini  wrote:

> Hi guys,
>
> Solr version :: 6.6.1
>
> **
>
> I have around 10 fields in my core. I want to make the search on this
> specific field to be case sensitive. Please advise, how to introduce case
> sensitivity at the field level. What changes do i need to make for this
> field ?
>
> Thanks,
> Karan
>


Re: Make search on the particular field to be case sensitive

2017-11-09 Thread Amrit Sarkar
Ah ok.

I didn't test and laid it over. Thank you Erick for correcting me out.

On 9 Nov 2017 9:06 p.m., "Erick Erickson"  wrote:

> This won't quite work. "string" types are totally un-analyzed you
> cannot add filters to a solr.StrField, you must use solr.TextField
> rather than solr.StrField.
>
>
>  docValues="true"/>
> 
>   
>
>  
>  
>
>
> start over and re-index from scratch in a new collection of course.
>
> You also need to make sure you really want to search on the whole
> field. The KeywordTokenizerFactory doesn't split the incoming test up
> _at all_. So if the input is
> "my dog has fleas" you can't search for just "dog" unless you use the
> extremely inefficient *dog* form. If you want to search for words, use
> an tokenizer that breaks up the input, WhitespaceTokenizer for
> instance.
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 3:24 AM, Amrit Sarkar 
> wrote:
> > Behavior of the field values is defined by fieldType analyzer
> declaration.
> >
> > If you look at the managed-schema;
> >
> > You will find fieldType declarations like:
> >
> >  positionIncrementGap="100">
> >>  
> >>  >> ignoreCase="true"/> 
>  >> class="solr.EnglishPossessiveFilterFactory"/>  >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>  >> class="solr.PorterStemFilterFactory"/>   type="query">
> >>   >> "solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true"
> synonyms=
> >> "synonyms.txt"/>  >> "lang/stopwords_en.txt" ignoreCase="true"/>  >> "solr.LowerCaseFilterFactory"/>  >> "solr.EnglishPossessiveFilterFactory"/>  >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>  >> class="solr.PorterStemFilterFactory"/>  
> >
> >
> > In you case fieldType is "string". *You need to write analyzer chain for
> > the same fieldType and don't include:*
> >  
> >
> > LowerCaseFilterFactory is responsible lowercase the token coming in query
> > and while indexing.
> >
> > Something like this will work for you:
> >
> >  > docValues="true"/>
> >  
>   > fieldType>
> >
> > I listed "KeywordTokenizerFactory" considering this is string, not text.
> >
> > More details on: https://lucene.apache.org/solr/guide/6_6/analyzers.html
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Thu, Nov 9, 2017 at 4:41 PM, Karan Saini 
> wrote:
> >
> >> Hi guys,
> >>
> >> Solr version :: 6.6.1
> >>
> >> **
> >>
> >> I have around 10 fields in my core. I want to make the search on this
> >> specific field to be case sensitive. Please advise, how to introduce
> case
> >> sensitivity at the field level. What changes do i need to make for this
> >> field ?
> >>
> >> Thanks,
> >> Karan
> >>
>


Re: How to routing document for send to particular shard range

2017-11-10 Thread Amrit Sarkar
Ketan,

here I have also created new field 'core' which value is any shard where I
> need to send documents and on retrieval use '_route_'  parameter with
> mentioning the particular shard. But issue facing still my
> clusterstate.json showing the "router":{"name":"compositeId"} is it means
> my settings not impacted? or its default.


Only answering this query, as Erick has already mentioned in the above
comment. You need to RECREATE the collection passinfg the "route.field" in
the "create collection" api parameters as "route.field" is
collection-specific property maintained at zookeeper (state.json /
clusterstate.json).

https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-create

I highly recommend not to alter core.properties manually when dealing with
SolrCloud and instead relying on SolrCloud APIs to make necessary change.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki  wrote:

> Hi Erik,
>
> My requirement to index the documents of particular organization to
> specific shard. Also I have made changes in core.properties as menions
> below.
>
> Model Collection:
> name=model
> shard=shard1
> collection=model
> router.name=implicit
> router.field=core
> shards=shard1,shard2
>
> Workset Collection:
> name=workset
> shard=shard1
> collection=workset
> router.name=implicit
> router.field=core
> shards=shard1,shard2
>
> here I have also created new field 'core' which value is any shard where I
> need to send documents and on retrieval use '_route_'  parameter with
> mentioning the particular shard. But issue facing still my
> clusterstate.json showing the "router":{"name":"compositeId"} is it means
> my settings not impacted? or its default.
>
> Please do needful.
>
> Regards,
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, November 10, 2017 12:06 PM
> To: solr-user
> Subject: Re: How to routing document for send to particular shard range
>
> You cannot just make configuration changes, whether you use implicit or
> compositeId is defined when you _create_ the collection and cannot be
> changed later.
>
> You need to create a new collection and specify router.name=implicit when
> you create it. Then you can route documents as you desire.
>
> I would caution against this though. If you use implicit routing _you_
> have to insure balancing. For instance, you could have 10,000,000 documents
> for "Org1" and 15 for "Org2", resulting in hugely unbalanced shards.
>
> Implicit routing is particularly useful for time-series indexing, where
> you, say, index a day's worth of documents to each shard. It may be
> appropriate in your case, but so far you haven't told us _why_ you think
> routing docs to particular shards is desirable.
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki  wrote:
> > Thanks Amrit,
> >
> > For suggesting me the approach.
> >
> > I have got some understanding regarding to it and i need to implement
> implicit routing for specific shard based. I have try by make changes on
> core.properties. but it can't work So can you please let me for the
> configuration changes needed. Is it need to create extra field for document
> to rout?
> >
> > I have below configuration Collection created manually:
> > 1: Workset with 4 shard and 4 replica
> > 2: Model with 4 shard and 4 replica
> >
> >
> > For e.g Core.properties for 1 shard :
> > Workset Colection:
> > name=workset
> > shard=shard1
> > collection=workset
> >
> > Model Collection:
> > name=model
> > shard=shard1
> > collection=model
> >
> >
> > So can u please let me the changes needed in configuration for the
> implicit routing.
> >
> > Please do needful.
> >
> > Regards,
> >
> >
> > -Original Message-
> > From: Amrit Sarkar [mailto:sarkaramr...@gmail.com]
> > Sent: Wednesday, November 08, 2017 12:36 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: How to routing document for send to particular shard
> > range
> >
> > Ketan,
> >
> > If you know defined indexing architecture; isn't it better to use
> "implicit" router by writing logic on your own end.
> >
> > If the document is of "Org1", send the do

Re: Nested facet complete wrong counts

2017-11-10 Thread Amrit Sarkar
Kenny,

This is a known behavior in multi-sharded collection where the field values
belonging to same facet doesn't reside in same shard. Yonik Seeley has
improved the Json Facet feature by introducing "overrequest" and "refine"
parameters.

Kindly checkout Jira:
https://issues.apache.org/jira/browse/SOLR-7452
https://issues.apache.org/jira/browse/SOLR-9432

Relevant blog: https://medium.com/@abb67cbb46b/1acfa77cd90c

On 10 Nov 2017 10:02 p.m., "kenny"  wrote:

> Hi all,
>
> We are doing some tests in solr 6.6 with json facet api and we get
> completely wrong counts for some combination of  facets
>
> Setting: We have a set of fields for 376k documents in our query (total
> 120M documents). We work with 2 shards. When doing first a faceting over
> the first facet and keeping these numbers, we subsequently do a nested
> faceting over both facets.
>
> Then we add the numbers of sub-facet and expect to get the (approximately)
> the same numbers back. Sometimes we get rounding errors of about 1%
> difference. But on other occasions it seems to way off
>
> for example
>
> Gender (3 values) Country (211 values)
> 16226 - 18424 = -2198 (-13.5461604832%)
> 282854 - 464387 = -181533 (-64.1790464338%)
> 40489 - 47902 = -7413 (-18.3086764306%)
> 36672 - 49749 = -13077 (-35.6593586387%)
>
> Gender (3 values)  Status (17 Values)
> 16226 - 16273 = -47 (-0.289658572661%)
> 282854 - 435974 = -153120 (-54.1339348215%)
> 40489 - 49925 = -9436 (-23.305095211%)
> 36672 - 54019 = -17347 (-47.3031195462%)
>
> ...
>
> These are the typical requests we submit. So note that we have refine and
> an overrequest, but we in the case of Gender vs Request we should query all
> the buckets anyway.
>
> {"wt":"json","rows":0,"json.facet":"{\"Status_sfhll\":\"hll(
> Status_sf)\",\"Status_sf\":{\"type\":\"terms\",\"field\":\"S
> tatus_sf\",\"missing\":true,\"refine\":true,\"overrequest\":
> 50,\"limit\":50,\"offset\":0}}","q":"*:*","fq":["type:\"something\""]}
>
> {"wt":"json","rows":0,"json.facet":"{\"Gender_sf\":{\"type\"
> :\"terms\",\"field\":\"Gender_sf\",\"missing\":true,\"refine
> \":true,\"overrequest\":10,\"limit\":10,\"offset\":0,\"
> facet\":{\"Status_sf\":{\"type\":\"terms\",\"field\":\"Statu
> s_sf\",\"missing\":true,\"refine\":true,\"overrequest\":50,\
> "limit\":50,\"offset\":0}}},\"Gender_sfhll\":\"hll(Gender_
> sf)\"}","q":"*:*","fq":["type:\"something\""]}
>
> Is this a known bug? Would switching to old facet api resolve this? Are
> there other parameters we miss?
>
>
> Thanks
>
>
> kenny
>
>
>


Re: How to routing document for send to particular shard range

2017-11-13 Thread Amrit Sarkar
Surely someone else can chim in;

but when you say: "so regarding to it we need to index the particular
> client data into particular shard so if its  manageable than we will
> improve the performance as we need"


You can / should create different collections for different client data, so
that you can for surely improve performance as per need. There are multiple
configurations which drives indexing and querying capabilities and
incorporating everything in single collection will hinder that flexibility.
Also if you need to add new client in future, you don't need to think about
sharding again, add new collection and tweak its configuration as per need.

Still if you need to use compositeKey to acheive your use-case, I am not
sure how to do that honestly. Since shards are predefined when collection
will be created. You cannot add more shards and such. You can only split a
shard, which will divide the index and hence the hash range. I will
strongly recommend you to reconsider your SolrCloud design technique for
your use-case.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Nov 13, 2017 at 7:31 PM, Ketan Thanki  wrote:

>
> Thanks Amrit,
>
> My requirement to achieve best performance while using document routing
> facility in solr so regarding to it we need to index the particular client
> data into particular shard so if its  manageable than we will improve the
> performance as we need.
>
> Please do needful.
>
>
> Regards,
>
>
> -Original Message-
> From: Amrit Sarkar [mailto:sarkaramr...@gmail.com]
> Sent: Friday, November 10, 2017 5:34 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to routing document for send to particular shard range
>
> Ketan,
>
> here I have also created new field 'core' which value is any shard where I
> > need to send documents and on retrieval use '_route_'  parameter with
> > mentioning the particular shard. But issue facing still my
> > clusterstate.json showing the "router":{"name":"compositeId"} is it
> > means my settings not impacted? or its default.
>
>
> Only answering this query, as Erick has already mentioned in the above
> comment. You need to RECREATE the collection passinfg the "route.field" in
> the "create collection" api parameters as "route.field" is
> collection-specific property maintained at zookeeper (state.json /
> clusterstate.json).
>
> https://lucene.apache.org/solr/guide/6_6/collections-
> api.html#CollectionsAPI-create
>
> I highly recommend not to alter core.properties manually when dealing with
> SolrCloud and instead relying on SolrCloud APIs to make necessary change.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki  wrote:
>
> > Hi Erik,
> >
> > My requirement to index the documents of particular organization to
> > specific shard. Also I have made changes in core.properties as menions
> > below.
> >
> > Model Collection:
> > name=model
> > shard=shard1
> > collection=model
> > router.name=implicit
> > router.field=core
> > shards=shard1,shard2
> >
> > Workset Collection:
> > name=workset
> > shard=shard1
> > collection=workset
> > router.name=implicit
> > router.field=core
> > shards=shard1,shard2
> >
> > here I have also created new field 'core' which value is any shard
> > where I need to send documents and on retrieval use '_route_'
> > parameter with mentioning the particular shard. But issue facing still
> > my clusterstate.json showing the "router":{"name":"compositeId"} is it
> > means my settings not impacted? or its default.
> >
> > Please do needful.
> >
> > Regards,
> >
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Friday, November 10, 2017 12:06 PM
> > To: solr-user
> > Subject: Re: How to routing document for send to particular shard
> > range
> >
> > You cannot just make configuration changes, whether you use implicit
> > or compositeId is defined when you _create_ the collection and cannot
> > be changed later.
> >
> > You need to create a new collection and specify router.name=implicit
> > wh

Re: SOLR not deleting records

2017-11-14 Thread Amrit Sarkar
A little more information would be beneficial;

COLO1 and COLO2 are collections? if yes, both have same configurations and
you are positively issuing deletes to the IDs already present in index etc.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Nov 14, 2017 at 12:41 PM, vbindal  wrote:

> We have to SOLR colos.
>
> We issues a command to delete: IDS DELETED: 1000236662963,
> 1000224906023, 1000240171970, 1000241597424, 1000241604072,
> 1000241604073, 1000240171754, 1000241604056, 1000241604062,
> 1000237569503]
>
> COLO1 deleted everything but COLO2 skipped some of the records. For ex:
> 1000224906023 was not deleted. This happens consistently.
>
> We are running them in Hard-commit, Soft Commit is off.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Index time boosting

2017-11-14 Thread Amrit Sarkar
Hi Venkat,

FYI: Index time boosting has been deprecated from latest versions of Solr:
https://issues.apache.org/jira/browse/LUCENE-6819.

Not sure which version you are on, but best consider the comments on the
JIRA before using it.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Nov 14, 2017 at 5:27 PM, Venkateswarlu Bommineni 
wrote:

> Hello Guys,
>
> I would like to understand how index time boosting works in Solr. and how
> it is relates to ommitNorms property in schema.xml.
>
> and i am trying to understand how it works internally , if you have any
> documentation please provide.
>
> Thanks,
> Venkat.
>


Re: Leading wildcard searches very slow

2017-11-17 Thread Amrit Sarkar
Sundeep,

You would like to explore
http://lucene.apache.org/solr/6_6_1/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
here probably.

Thanks
Amrit Sarkar

On 18 Nov 2017 6:06 a.m., "Sundeep T"  wrote:

> Hi,
>
> We have several indexed string fields which is not tokenized  and does not
> have docValues enabled.
>
> When we do leading wildcard searches on these fields they are running very
> slow. We were thinking that since this field is indexed, such queries
> should be running pretty quickly. We are using Solr 6.6.1. Anyone has ideas
> on why these queries are running slow and if there are any ways to speed
> them up?
>
> Thanks
> Sundeep
>


Re: Issue with CDCR bootstrapping in Solr 7.1

2017-11-30 Thread Amrit Sarkar
Hi Tom,

I see what you are saying and I too think this is a bug, but I will confirm
once on the code. Bootstrapping should happen on all the nodes of the
target.

Meanwhile can you index more than 100 documents in the source and do the
exact same experiment again. Followers will not copy the entire index of
Leader unless the difference in versions in docs are more than
"numRecordsToKeep", which is default 100, unless you have modified in
solrconfig.xml.

Looking forward to your analysis.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters  wrote:

> I'm running into an issue with the initial CDCR bootstrapping of an
> existing index. In short, after turning on CDCR only the leader replica in
> the target data center will have the documents replicated and it will not
> exist in any of the follower replicas in the target data center. All
> subsequent incremental updates made to the source datacenter will appear in
> all replicas in the target data center.
>
> A little more details:
>
> I have two clusters setup, a source cluster and a target cluster. Each
> cluster has only one shard and three replicas. I used the configuration
> detailed in the Source and Target sections of the reference guide as-is
> with the exception of updating the zkHost (https://lucene.apache.org/
> solr/guide/7_1/cross-data-center-replication-cdcr.html#
> cdcr-configuration-2).
>
> The source data center has the following nodes:
> solr01-a, solr01-b, and solr01-c
>
> The target data center has the following nodes:
> solr02-a, solr02-b, and solr02-c
>
> Here are the steps that I've done:
>
> 1. Create collection in source and target data centers
>
> 2. Add a number of documents to the source data center
>
> 3. Verify:
>
> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
> solr01-a: 81
> solr01-b: 81
> solr01-c: 81
> solr02-a: 0
> solr02-b: 0
> solr02-c: 0
>
> 4. Start CDCR:
>
> $ curl 'solr01-a:8080/solr/mycollection/cdcr?action=START'
>
> 5. See if target data center has received the initial index
>
> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
> solr01-a: 81
> solr01-b: 81
> solr01-c: 81
> solr02-a: 0
> solr02-b: 0
> solr02-c: 81
>
> note: only -c has received the index
>
> 6. Add another document to the source cluster
>
> 7. See how many documents are in each node:
>
> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
> solr01-a: 82
> solr01-b: 82
> solr01-c: 82
> solr02-a: 1
> solr02-b: 1
> solr02-c: 82
>
>
> As you can see, the initial index only made it to one of the replicas in
> the target data center, but subsequent incremental updates have appeared
> everywhere I would expect. Any help would be greatly appreciated, thanks.
>
>
>
> This message and any attachment may contain information that is
> confidential and/or proprietary. Any use, disclosure, copying, storing, or
> distribution of this e-mail or any attached file by anyone other than the
> intended recipient is strictly prohibited. If you have received this
> message in error, please notify the sender by reply email and delete the
> message and any attachments. Thank you.
>


Re: Issue with CDCR bootstrapping in Solr 7.1

2017-11-30 Thread Amrit Sarkar
Tom,

This is very useful:

> I found a way to get the follower replicas to receive the documents from
> the leader in the target data center, I have to restart the solr instance
> running on that server. Not sure if this information helps at all.


You have to issue hardcommit on target after the bootstrapping is done.
Reloading makes the core opening a new searcher. While explicit commit is
issued at target leader after the BS is done, follower are left unattended
though the docs are copied over.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters  wrote:

> Hi Amrit,
>
> Starting with more documents doesn't appear to have made a difference.
> This time I tried with >1000 docs. Here are the steps I took:
>
> 1. Deleted the collection on both the source and target DCs.
>
> 2. Recreated the collections.
>
> 3. Indexed >1000 documents on source data center, hard commmit
>
>   $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>   solr01-a: 1368
>   solr01-b: 1368
>   solr01-c: 1368
>   solr02-a: 0
>   solr02-b: 0
>   solr02-c: 0
>
> 4. Enabled CDCR and checked docs
>
>   $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START'
>
>   $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>   solr01-a: 1368
>   solr01-b: 1368
>   solr01-c: 1368
>   solr02-a: 0
>   solr02-b: 0
>   solr02-c: 1368
>
> Some additional notes:
>
> * I do not have numRecordsToKeep defined in my solrconfig.xml, so I assume
> it will use the default of 100
>
> * I found a way to get the follower replicas to receive the documents from
> the leader in the target data center, I have to restart the solr instance
> running on that server. Not sure if this information helps at all.
>
> > On Nov 30, 2017, at 11:22 AM, Amrit Sarkar 
> wrote:
> >
> > Hi Tom,
> >
> > I see what you are saying and I too think this is a bug, but I will
> confirm
> > once on the code. Bootstrapping should happen on all the nodes of the
> > target.
> >
> > Meanwhile can you index more than 100 documents in the source and do the
> > exact same experiment again. Followers will not copy the entire index of
> > Leader unless the difference in versions in docs are more than
> > "numRecordsToKeep", which is default 100, unless you have modified in
> > solrconfig.xml.
> >
> > Looking forward to your analysis.
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters  wrote:
> >
> >> I'm running into an issue with the initial CDCR bootstrapping of an
> >> existing index. In short, after turning on CDCR only the leader replica
> in
> >> the target data center will have the documents replicated and it will
> not
> >> exist in any of the follower replicas in the target data center. All
> >> subsequent incremental updates made to the source datacenter will
> appear in
> >> all replicas in the target data center.
> >>
> >> A little more details:
> >>
> >> I have two clusters setup, a source cluster and a target cluster. Each
> >> cluster has only one shard and three replicas. I used the configuration
> >> detailed in the Source and Target sections of the reference guide as-is
> >> with the exception of updating the zkHost (https://lucene.apache.org/
> >> solr/guide/7_1/cross-data-center-replication-cdcr.html#
> >> cdcr-configuration-2).
> >>
> >> The source data center has the following nodes:
> >>solr01-a, solr01-b, and solr01-c
> >>
> >> The target data center has the following nodes:
> >>solr02-a, solr02-b, and solr02-c
> >>
> >> Here are the steps that I've done:
> >>
> >> 1. Create collection in source and target data centers
> >>
> >> 2. Add a number of documents to the source data center
> >>
> >> 3. Verify:
> >>
> >>$ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; 

Re: Issue with CDCR bootstrapping in Solr 7.1

2017-11-30 Thread Amrit Sarkar
Tom,

(and take care not to restart the leader node otherwise it will replicate
> from one of the replicas which is missing the index).

How is this possible? Ok I will look more into it. Appreciate if someone
else also chimes in if they have similar issue.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Fri, Dec 1, 2017 at 4:49 AM, Tom Peters  wrote:

> Hi Amrit, I tried issuing hard commits to the various nodes in the target
> cluster and it does not appear to cause the follower replicas to receive
> the initial index. The only way I can get the replicas to see the original
> index is by restarting those nodes (and take care not to restart the leader
> node otherwise it will replicate from one of the replicas which is missing
> the index).
>
>
> > On Nov 30, 2017, at 12:16 PM, Amrit Sarkar 
> wrote:
> >
> > Tom,
> >
> > This is very useful:
> >
> >> I found a way to get the follower replicas to receive the documents from
> >> the leader in the target data center, I have to restart the solr
> instance
> >> running on that server. Not sure if this information helps at all.
> >
> >
> > You have to issue hardcommit on target after the bootstrapping is done.
> > Reloading makes the core opening a new searcher. While explicit commit is
> > issued at target leader after the BS is done, follower are left
> unattended
> > though the docs are copied over.
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters 
> wrote:
> >
> >> Hi Amrit,
> >>
> >> Starting with more documents doesn't appear to have made a difference.
> >> This time I tried with >1000 docs. Here are the steps I took:
> >>
> >> 1. Deleted the collection on both the source and target DCs.
> >>
> >> 2. Recreated the collections.
> >>
> >> 3. Indexed >1000 documents on source data center, hard commmit
> >>
> >>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> done
> >>  solr01-a: 1368
> >>  solr01-b: 1368
> >>  solr01-c: 1368
> >>  solr02-a: 0
> >>  solr02-b: 0
> >>  solr02-c: 0
> >>
> >> 4. Enabled CDCR and checked docs
> >>
> >>  $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START'
> >>
> >>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> done
> >>  solr01-a: 1368
> >>  solr01-b: 1368
> >>  solr01-c: 1368
> >>  solr02-a: 0
> >>  solr02-b: 0
> >>  solr02-c: 1368
> >>
> >> Some additional notes:
> >>
> >> * I do not have numRecordsToKeep defined in my solrconfig.xml, so I
> assume
> >> it will use the default of 100
> >>
> >> * I found a way to get the follower replicas to receive the documents
> from
> >> the leader in the target data center, I have to restart the solr
> instance
> >> running on that server. Not sure if this information helps at all.
> >>
> >>> On Nov 30, 2017, at 11:22 AM, Amrit Sarkar 
> >> wrote:
> >>>
> >>> Hi Tom,
> >>>
> >>> I see what you are saying and I too think this is a bug, but I will
> >> confirm
> >>> once on the code. Bootstrapping should happen on all the nodes of the
> >>> target.
> >>>
> >>> Meanwhile can you index more than 100 documents in the source and do
> the
> >>> exact same experiment again. Followers will not copy the entire index
> of
> >>> Leader unless the difference in versions in docs are more than
> >>> "numRecordsToKeep", which is default 100, unless you have modified in
> >>> solrconfig.xml.
> >>>
> >>> Looking forward to your analysis.
> >>>
> >>> Amrit Sarkar
> >>> Search Engineer
> >>> Lucidworks, Inc.
> >>> 415-589-9269
> >>> www.lucidworks.com
> >>> Twitt

Re: Issue with CDCR bootstrapping in Solr 7.1

2017-12-05 Thread Amrit Sarkar
Tom,

Thank you for trying out bunch of things with CDCR setup. I am successfully
able to replicate the exact issue on my setup, this is a problem.

I have opened a JIRA for the same:
https://issues.apache.org/jira/browse/SOLR-11724. Feel free to add any
relevant details as you like.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Dec 5, 2017 at 2:23 AM, Tom Peters  wrote:

> Not sure how it's possible. But I also tried using the _default config and
> just adding in the source and target configuration to make sure I didn't
> have something wonky in my custom solrconfig that was causing this issue. I
> can confirm that until I restart the follower nodes, they will not receive
> the initial index.
>
> > On Dec 1, 2017, at 12:52 AM, Amrit Sarkar 
> wrote:
> >
> > Tom,
> >
> > (and take care not to restart the leader node otherwise it will replicate
> >> from one of the replicas which is missing the index).
> >
> > How is this possible? Ok I will look more into it. Appreciate if someone
> > else also chimes in if they have similar issue.
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Fri, Dec 1, 2017 at 4:49 AM, Tom Peters  wrote:
> >
> >> Hi Amrit, I tried issuing hard commits to the various nodes in the
> target
> >> cluster and it does not appear to cause the follower replicas to receive
> >> the initial index. The only way I can get the replicas to see the
> original
> >> index is by restarting those nodes (and take care not to restart the
> leader
> >> node otherwise it will replicate from one of the replicas which is
> missing
> >> the index).
> >>
> >>
> >>> On Nov 30, 2017, at 12:16 PM, Amrit Sarkar 
> >> wrote:
> >>>
> >>> Tom,
> >>>
> >>> This is very useful:
> >>>
> >>>> I found a way to get the follower replicas to receive the documents
> from
> >>>> the leader in the target data center, I have to restart the solr
> >> instance
> >>>> running on that server. Not sure if this information helps at all.
> >>>
> >>>
> >>> You have to issue hardcommit on target after the bootstrapping is done.
> >>> Reloading makes the core opening a new searcher. While explicit commit
> is
> >>> issued at target leader after the BS is done, follower are left
> >> unattended
> >>> though the docs are copied over.
> >>>
> >>> Amrit Sarkar
> >>> Search Engineer
> >>> Lucidworks, Inc.
> >>> 415-589-9269
> >>> www.lucidworks.com
> >>> Twitter http://twitter.com/lucidworks
> >>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>> Medium: https://medium.com/@sarkaramrit2
> >>>
> >>> On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters 
> >> wrote:
> >>>
> >>>> Hi Amrit,
> >>>>
> >>>> Starting with more documents doesn't appear to have made a difference.
> >>>> This time I tried with >1000 docs. Here are the steps I took:
> >>>>
> >>>> 1. Deleted the collection on both the source and target DCs.
> >>>>
> >>>> 2. Recreated the collections.
> >>>>
> >>>> 3. Indexed >1000 documents on source data center, hard commmit
> >>>>
> >>>> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> >> done
> >>>> solr01-a: 1368
> >>>> solr01-b: 1368
> >>>> solr01-c: 1368
> >>>> solr02-a: 0
> >>>> solr02-b: 0
> >>>> solr02-c: 0
> >>>>
> >>>> 4. Enabled CDCR and checked docs
> >>>>
> >>>> $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START'
> >>>>
> >>>> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> >> done
> >>&g

Identify Reference Leak in Custom Code related to Solr

2017-12-18 Thread Amrit Sarkar
Hi,

We incorporated *https://github.com/sematext/solr-researcher
<https://github.com/sematext/solr-researcher>* into our project and it is
responsible for memory leak / reference leak which is causing multiple
*SolrIndexSearcher
*objects in the heap dump.

37 instances of *"org.apache.solr.search.SolrIndexSearcher"*, loaded
by *"org.eclipse.jetty.webapp.WebAppClassLoader
@ 0x5e0020830"*occupy *744,482,384 (48.16%)* bytes.

Biggest instances:

   - org.apache.solr.search.SolrIndexSearcher @ 0x5fcac64c0 - 108,168,104
   (7.00%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x616b414b0 - 54,982,536
   (3.56%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x60aaa5820 - 35,614,544
   (2.30%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x5ed303418 - 26,742,472
   (1.73%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x6c04d8948 - 26,413,728
   (1.71%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x66d2f1ca8 - 26,230,600
   (1.70%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x624904550 - 25,800,200
   (1.67%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x6baa4c5f8 - 25,094,760
   (1.62%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x676fefdd0 - 24,720,312
   (1.60%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x6634d7a08 - 24,315,864
   (1.57%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x652a82880 - 24,186,328
   (1.56%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x6ad3ef080 - 24,078,800
   (1.56%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x64bf747b0 - 24,073,736
   (1.56%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x6a752cce0 - 23,937,584
   (1.55%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x698fba4f8 - 23,339,000
   (1.51%) bytes.
   - org.apache.solr.search.SolrIndexSearcher @ 0x6a12724c0 - 23,066,512
   (1.49%) bytes.


We would really appreciate if some can help us on how to pin-point:

1. *Reference leak* (since it is an independent third-party plugin).

This is taking almost 80% of the total heap memory allocated (16GB).
Looking forward to positive responses.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


Re: Identify Reference Leak in Custom Code related to Solr

2017-12-18 Thread Amrit Sarkar
Emir,

Solr version: 6.6, SolrCloud

We followed the instructions on README.md on the github project.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Dec 18, 2017 at 5:13 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Amrit,
> I’ll check with my colleague that worked on this. In the meantime, can you
> provide more info about setup: Solr version, M-S or cloud and steps that we
> can do to reproduce it.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 18 Dec 2017, at 12:10, Amrit Sarkar  wrote:
> >
> > Hi,
> >
> > We incorporated *https://github.com/sematext/solr-researcher
> > <https://github.com/sematext/solr-researcher>* into our project and it
> is
> > responsible for memory leak / reference leak which is causing multiple
> > *SolrIndexSearcher
> > *objects in the heap dump.
> >
> > 37 instances of *"org.apache.solr.search.SolrIndexSearcher"*, loaded
> > by *"org.eclipse.jetty.webapp.WebAppClassLoader
> > @ 0x5e0020830"*occupy *744,482,384 (48.16%)* bytes.
> >
> > Biggest instances:
> >
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x5fcac64c0 - 108,168,104
> >   (7.00%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x616b414b0 - 54,982,536
> >   (3.56%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x60aaa5820 - 35,614,544
> >   (2.30%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x5ed303418 - 26,742,472
> >   (1.73%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x6c04d8948 - 26,413,728
> >   (1.71%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x66d2f1ca8 - 26,230,600
> >   (1.70%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x624904550 - 25,800,200
> >   (1.67%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x6baa4c5f8 - 25,094,760
> >   (1.62%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x676fefdd0 - 24,720,312
> >   (1.60%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x6634d7a08 - 24,315,864
> >   (1.57%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x652a82880 - 24,186,328
> >   (1.56%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x6ad3ef080 - 24,078,800
> >   (1.56%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x64bf747b0 - 24,073,736
> >   (1.56%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x6a752cce0 - 23,937,584
> >   (1.55%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x698fba4f8 - 23,339,000
> >   (1.51%) bytes.
> >   - org.apache.solr.search.SolrIndexSearcher @ 0x6a12724c0 - 23,066,512
> >   (1.49%) bytes.
> >
> >
> > We would really appreciate if some can help us on how to pin-point:
> >
> > 1. *Reference leak* (since it is an independent third-party plugin).
> >
> > This is taking almost 80% of the total heap memory allocated (16GB).
> > Looking forward to positive responses.
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
>
>


Regarding embedded ZK with Solr

2017-01-28 Thread Amrit Sarkar
I would like to understand how the embedded ZK works with Solr. If Xg
memory is allocated to the Solr installation and we spin up the SolrCloud
with embedded ZK; what part/percentage of the X is allocated to the ZK or
is it shared?

If that is known, how can I change the memory settings for the embedded ZK?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2


Re: Step By Step guide to create Solr Cloud in Solr 6.x

2017-05-07 Thread Amrit Sarkar
Following up Erick's response,

This particular article will help setting up Setting up Solr Cloud 6.3.0
with Zookeeper 3.4.6
<https://medium.com/@sarkaramrit2/setting-up-solr-cloud-6-3-0-with-zookeeper-3-4-6-867b96ec4272>

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2


Re: distribution of leader and replica in SolrCloud

2017-05-08 Thread Amrit Sarkar
Bernd,

When you create a collection via Collections API, the internal logic tries
its best to equally distribute the nodes across the shards but sometimes it
don't happen.

The best thing about SolrCloud is you can manipulate its cloud architecture
on the fly using Collections API. You can delete a replica of one
particular shard and add a replica (on a specific machine/node) to any of
the shards anytime depending to your design.

For the above, you can simply:

call DELETEREPLICA api on shard1--->server2:7574 (or the other one)

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DELETEREPLICA:DeleteaReplica

boss -- shard1
   | |-- server2:8983 (leader)
   |
--- shard2 - server1:8983
   | |-- server5:7575 (leader)
   |
--- shard3 - server3:8983 (leader)
   | |-- server4:8983
   |
--- shard4 - server1:7574 (leader)
   | |-- server4:7574
   |
--- shard5 - server3:7574 (leader)
 |-- server5:8983

call ADDREPLICA api on shard1>server1:8983

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DELETEREPLICA:DeleteaReplica

boss -- shard1 - server1:8983
   | |-- server2:8983 (leader)
   |
--- shard2 - server1:8983
   | |-- server5:7575 (leader)
   |
--- shard3 - server3:8983 (leader)
   | |-- server4:8983
   |
--- shard4 - server1:7574 (leader)
   | |-- server4:7574
   |
--- shard5 - server3:7574 (leader)
 |-- server5:8983

Hope this helps.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, May 8, 2017 at 5:08 PM, Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> My assumption was that the strength of SolrCloud is the distribution
> of leader and replica within the Cloud and make the Cloud somewhat
> failsafe.
> But after setting up SolrCloud with a collection I have both, leader and
> replica, on the same shard. And this should be failsafe?
>
> o.a.s.h.a.CollectionsHandler Invoked Collection Action :create with params
> replicationFactor=2&routerName=compositeId&collection.configName=boss&
> maxShardsPerNode=1&name=boss&router.name=compositeId&action=
> CREATE&numShards=5
>
> boss -- shard1 - server2:7574
>| |-- server2:8983 (leader)
>|
> --- shard2 - server1:8983
>| |-- server5:7575 (leader)
>|
> --- shard3 - server3:8983 (leader)
>| |-- server4:8983
>|
> --- shard4 - server1:7574 (leader)
>| |-- server4:7574
>|
> --- shard5 - server3:7574 (leader)
>  |-- server5:8983
>
> From my point of view, if server2 is going to crash then shard1 will
> disappear and
> 1/5th of the index is missing.
>
> What is your opinion?
>
> Regards
> Bernd
>
>
>
>


Re: SPLITSHARD Working

2017-05-08 Thread Amrit Sarkar
Vrinda,

The expected behavior if parent shard 'shardA' resides on node'1', node'2'
... node'n' and do a SPLITSHARD on it.

the child shards, shardA_0 and shardA_1 will reside on node'1', node'2' ...
node'n'.

shardA --- node'1' (leader) & node'2' (replica)

after splitshard;

shardA --- node'1' (leader) & node'2' (replica) (INACTIVE)
shardA_0 -- node'1' & node'2' (ACTIVE)
shardA_1 -- node'1' & node'2' (ACTIVE)

Any one of them can be a leader and replica for the children nodes.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Mon, May 8, 2017 at 4:32 PM, vrindavda  wrote:

> Thanks I go it.
>
> But I see that distribution of shards and replicas is not equal.
>
>  For Example in my case :
> I had shard 1 and shard2  on Node 1 and their replica_1 and replica_2 on
> Node 2.
> I did SHARDSPLIT on shard1  to get shard1_0 and shard1_1  such that
> and shard1_0_replica0 are created on Node 1 and shard1_0_replica1,
> shard1_1_replica1 and  shard1_1_replica0 on Node 2.
>
> Is this expected behavior ?
>
> Thank you,
> Vrinda Davda
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/SPLITSHARD-Working-tp4333876p4333922.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Could not initialize class JdbcSynonymFilterFactory

2017-05-09 Thread Amrit Sarkar
Just gathering more information on this Solr-JDBC;

Is it a open source plugin provided on https://github.com/shopping24/ and
not part of actual project *lucene-solr* project?

https://github.com/shopping24/solr-jdbc-synonyms


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, May 9, 2017 at 4:30 PM, sajjad karimi  wrote:

> http://stackoverflow.com/questions/43857712/could-not-initialize-class-
> jdbcsynonymfilterfactory
> :
>
>
> I'm new to solr, I want to add a field type with JdbcSynonymFilter and
> JdbcStopFilter to solr schema. I added my data source same as instruction
> in this link: [Loading stopwords from Postgresql to Solr6][1]
>
> then i configured managed-schema with code below:
>
> 
>  
>  pattern="[\s]+"
> />
>  class="com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory"
>sql="SELECT concat(term, '=>', use) as line FROM thesaurus;"
>dataSource="jdbc/dsTest" ignoreCase="false" expand="true" />
>  class="com.s24.search.solr.analysis.jdbc.JdbcStopFilterFactory"
> sql="SELECT stopword FROM stopwords"
> dataSource="jdbc/dsTest"/>
>  
> 
>
> I added solr-jdbc to dist folder, postgressql driver, beanutils and
> dbutils to contrib/jdbc/lib folder. Then, I included libs in solrconfig.xml
> of data_driven_schema_configs:
>
>  regex=".*\.jar" />
>regex="solr-jdbc-\d.*\.jar" />
>
> I encountered the following error when I was trying to start SolrCloud.
>
> > "Could not initialize class
> com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory,
> trace=java.lang.NoClassDefFoundError:
> Could not initialize class
> com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory"
>
>
>   [1]:
> http://stackoverflow.com/questions/43724758/loading-
> stopwords-from-postgresql-to-solr6?noredirect=1#comment74559858_43724758
>


Re: Number of requests spike up, when i do the delta Import.

2017-05-31 Thread Amrit Sarkar
I am facing kinda similar issue lately where full-import is taking seconds
while delta-import is taking hours.

Can you share some more metrics/numbers related to full-import and
delta-import requested, rows fetched and time?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Wed, May 31, 2017 at 2:51 PM, vrindavda  wrote:

> Hello,
> Number of requests spike up, whenever I do the delta import in Solr.
> Please help me understand this.
>
>
> <http://lucene.472066.n3.nabble.com/file/n4338162/solr.jpg>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-
> Import-tp4338162.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Document Routing

2017-06-01 Thread Amrit Sarkar
Sathyam,

It seems your interpretation is wrong as CloudSolrClient calculates (hashes
the document id and determine the range it belongs to) which shard the
document incoming belongs to. As you have 10 shards, the document will
belong to one of them, that is what being calculated and eventually pushed
to the leader of that shard.

The confluence link provides the insights in much detail:
https://lucidworks.com/2013/06/13/solr-cloud-document-routing/
Another useful link:
https://lucidworks.com/2013/06/13/solr-cloud-document-routing/

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Jun 1, 2017 at 11:52 AM, Sathyam 
wrote:

> HI,
>
> I am indexing documents to a 10 shard collection (testcollection, having no
> replicas) in solr6 cluster using CloudSolrClient. I saw that there is a lot
> of peer to peer document distribution going on when I looked at the solr
> logs.
>
> An example log statement is as follows:
> 2017-06-01 06:07:28.378 INFO  (qtp1358444045-3673692) [c:testcollection
> s:shard8 r:core_node7 x:testcollection_shard8_replica1]
> o.a.s.u.p.LogUpdateProcessorFactory [testcollection_shard8_replica1]
>  webapp=/solr path=/update params={update.distrib=TOLEADER&distrib.from=
> http://10.199.42.29:8983/solr/testcollection_shard7_
> replica1/&wt=javabin&version=2}{add=[BQECDwZGTCEBHZZBBiIP
> (1568981383488995328), BQEBBQZB2il3wGT/0/mB (1568981383490043904),
> BQEBBQZFnhOJRj+m9RJC (1568981383491092480), BQEGBgZIeBE1klHS4fxk
> (1568981383492141056), BQEBBQZFVTmRx2VuCgfV (1568981383493189632)]} 0 25
>
> When I went through the code of CloudSolrClient on grepcode I saw that the
> client itself finds out which server it needs to hit by using the message
> id hash and getting the shard range information from state.json.
> Then it is quite confusing to me why there is a distribution of data
> between peers as there is no replication and each shard is a leader.
>
> I would like to know why this is happening and how to avoid it or if the
> above log statement means something else and I am misinterpreting
> something.
>
> --
> Sathyam Doraswamy
>


Re: Solr Document Routing

2017-06-01 Thread Amrit Sarkar
Sorry, The confluence link:
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Jun 1, 2017 at 2:11 PM, Amrit Sarkar  wrote:

> Sathyam,
>
> It seems your interpretation is wrong as CloudSolrClient calculates
> (hashes the document id and determine the range it belongs to) which shard
> the document incoming belongs to. As you have 10 shards, the document will
> belong to one of them, that is what being calculated and eventually pushed
> to the leader of that shard.
>
> The confluence link provides the insights in much detail:
> https://lucidworks.com/2013/06/13/solr-cloud-document-routing/
> Another useful link: https://lucidworks.com/2013/06/13/solr-cloud-
> document-routing/
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Thu, Jun 1, 2017 at 11:52 AM, Sathyam 
> wrote:
>
>> HI,
>>
>> I am indexing documents to a 10 shard collection (testcollection, having
>> no
>> replicas) in solr6 cluster using CloudSolrClient. I saw that there is a
>> lot
>> of peer to peer document distribution going on when I looked at the solr
>> logs.
>>
>> An example log statement is as follows:
>> 2017-06-01 06:07:28.378 INFO  (qtp1358444045-3673692) [c:testcollection
>> s:shard8 r:core_node7 x:testcollection_shard8_replica1]
>> o.a.s.u.p.LogUpdateProcessorFactory [testcollection_shard8_replica1]
>>  webapp=/solr path=/update params={update.distrib=TOLEADER&distrib.from=
>> http://10.199.42.29:8983/solr/testcollection_shard7_replica1
>> /&wt=javabin&version=2}{add=[BQECDwZGTCEBHZZBBiIP
>> (1568981383488995328), BQEBBQZB2il3wGT/0/mB (1568981383490043904),
>> BQEBBQZFnhOJRj+m9RJC (1568981383491092480), BQEGBgZIeBE1klHS4fxk
>> (1568981383492141056), BQEBBQZFVTmRx2VuCgfV (1568981383493189632)]} 0 25
>>
>> When I went through the code of CloudSolrClient on grepcode I saw that the
>> client itself finds out which server it needs to hit by using the message
>> id hash and getting the shard range information from state.json.
>> Then it is quite confusing to me why there is a distribution of data
>> between peers as there is no replication and each shard is a leader.
>>
>> I would like to know why this is happening and how to avoid it or if the
>> above log statement means something else and I am misinterpreting
>> something.
>>
>> --
>> Sathyam Doraswamy
>>
>
>


Re: Number of requests spike up, when i do the delta Import.

2017-06-01 Thread Amrit Sarkar
Erick,

Thanks for the pointer. Getting astray from what Vrinda is looking for
(sorry about that), what if there are no sub-entities? and no
deltaImportQuery passed too. I looked into the code and determine it
calculates the deltaImportQuery itself,
SQLEntityProcessor:getDeltaImportQuery(..)::126.

Ideally then, a full-import or the delta-import should take similar time to
build the docs (fetch next row). I may very well be going entirely wrong
here.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Jun 1, 2017 at 1:50 PM, vrindavda  wrote:

> Thanks Erick,
>
>  But how do I solve this? I tried creating Stored proc instead of plain
> query, but no change in performance.
>
> For delta import it in processing more documents than the total documents.
> In this case delta import is not helping at all, I cannot switch to full
> import each time. This was working fine with less data.
>
> Thank you,
> Vrinda Davda
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-
> Import-tp4338162p4338444.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrCloud CDCR issue

2018-08-14 Thread Amrit Sarkar
Hi,

Yeah if you look above I have stated the same jira. I see your question on
3DCs with Active-Active scenario, will respond there.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Mon, Aug 13, 2018 at 9:43 PM cdatta  wrote:

> And I was thinking about this one:
> https://issues.apache.org/jira/browse/SOLR-11959.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: SolrCloud CDCR with 3+ DCs

2018-08-20 Thread Amrit Sarkar
To the concerned,

This is certainly unfortunate if 3-way Active CDCR is not happening
successfully. At the time of writing the feature I was able to perform
N-way Active CDCR approach. How are the logs looking, are the documents are
not getting forwarded in sync? Can you attach the source solr cluster
server logs?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Fri, Aug 17, 2018 at 11:49 PM cdatta  wrote:

> Any pointer would be much appreciated..
>
> Thanks..
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Solr CDCR replication not working

2018-09-07 Thread Amrit Sarkar
Basic Authentication in clusters is not supported as of today in CDCR.

On Fri, 7 Sep 2018, 4:53 pm Mrityunjaya Pathak, 
wrote:

> I have setup two solr cloud instances in two different Datacenters Target
> solr cloud machine is copy of source machine with basicAuth enabled on
> them. I am unable to see any replication on target.
>
> Solr Version :6.6.3
>
> I have done config changes as suggested on
> https://lucene.apache.org/solr/guide/6_6/cross-data-center-replication-cdcr.html
>
> Source Config Changes
>
> 
> 
> ...
> 
>   
> serverIP:2181,serverIP:2182,serverIP:2183
> sitecore_master_index
> sitecore_master_index
>   
>
>   
> 8
> 1000
> 128
>   
>
>   
> 1000
>   
> 
>   
>
> 
>   ${solr.ulog.dir:}
>name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}
> 
>
>
> 
>   ${solr.autoCommit.maxTime:15000}
>   false
> 
>
> 
>   ${solr.autoSoftCommit.maxTime:-1}
> 
>   
>
>   ...
>   
>
> Target Config Changes
>
> 
> 
> ...
> 
>   
> disabled
>   
> 
> 
>   
>   
> 
> 
>   
> cdcr-proc-chain
>   
> 
>  
>
> 
>   ${solr.ulog.dir:}
>name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}
> 
>
> 
>   ${solr.autoCommit.maxTime:15000}
>   false
> 
>
> 
>   ${solr.autoSoftCommit.maxTime:-1}
> 
>
>   
>
>   ...
>   
>
> Below are logs from Source target.
>
> ERROR (zkCallback-4-thread-2-processing-n:sourceIP:8983_solr) [   ]
> o.a.s.c.s.i.CloudSolrClient Request to collection collection1 failed due to
> (510) org.apache.solr.common.SolrException: Could not find a healthy node
> to handle the request., retry? 5
> 2018-09-07 10:36:14.295 WARN
> (zkCallback-4-thread-2-processing-n:sourceIP:8983_solr) [   ]
> o.a.s.h.CdcrReplicatorManager Unable to instantiate the log reader for
> target collection collection1
> org.apache.solr.common.SolrException: Could not find a healthy node to
> handle the request.
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1377)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1134)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1073)
> at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
> at
> org.apache.solr.handler.CdcrReplicatorManager.getCheckpoint(CdcrReplicatorManager.java:196)
> at
> org.apache.solr.handler.CdcrReplicatorManager.initLogReaders(CdcrReplicatorManager.java:159)
> at
> org.apache.solr.handler.CdcrReplicatorManager.stateUpdate(CdcrReplicatorManager.java:134)
> at
> org.apache.solr.handler.CdcrStateManager.callback(CdcrStateManager.java:36)
> at
> org.apache.solr.handler.CdcrLeaderStateManager.setAmILeader(CdcrLeaderStateManager.java:108)
> at
> org.apache.solr.handler.CdcrLeaderStateManager.checkIfIAmLeader(CdcrLeaderStateManager.java:95)
> at
> org.apache.solr.handler.CdcrLeaderStateManager.access$400(CdcrLeaderStateManager.java:40)
> at
> org.apache.solr.handler.CdcrLeaderStateManager$LeaderStateWatcher.process(CdcrLeaderStateManager.java:150)
> at
> org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:269)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-09-07 10:36:14.310 INFO
> (coreLoadExecutor-8-thread-3-processing-n:sourceIP:8983_solr) [   ]
> o.a.s.c.SolrConfig Using Lucene MatchVersion: 6.6.3
> 2018-09-07 10:36:14.315 INFO
> (zkCallback-4-thread-1-processing-n:sourceIP:8983_solr) [   ]
> o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
> state:SyncConnected type:NodeDataChanged
> path:/collections/collection1/state.json] for collection [sitecore] has
> occurred - updating... (live nodes size: [1])
> 2018-09-07 10:36:14.343 WARN
> (cdcr-replicator-211-thread-

Re: SolrCloud CDCR with 3+ DCs

2018-09-07 Thread Amrit Sarkar
Yeah, I am not sure about how the Authentication band aid feature will
work, the mentioned stackoverflow link. It is about time we include basic
authentication support in CDCR.

On Thu, 6 Sep 2018, 8:41 pm cdatta,  wrote:

> Hi Amrit, Thanks for your response.
>
> We wiped out our complete installation and started a fresh one. Now the
> multi-direction replication is working but we are seeing errors related to
> the authentication sporadically.
>
> Thanks & Regards,
> Chandi Datta
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Non-Solr-related | Reporting abuse | Harshit Arora

2018-09-27 Thread Amrit Sarkar
Community members help each other out when you behave with decency. This
man definitely doesn't know how to.

[image: Screen Shot 2018-09-28 at 1.07.11 AM.png]

I want to make sure he gets recognized IF he ever reach out to the mailing
list:
https://lnkd.in/fWkfDCv 
Malaviya National Institute of Technology Jaipur, India

Apologies in advance and kindly ignore if this doesn't concern you.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


Re: partial update in solr

2018-10-29 Thread Amrit Sarkar
Hi Zahra,

To answer your question on seeing "No such processor atomic" with
AtomicUpdateProcessorFactory;

The feature is introduced in Solr 6.6.1 and 7.0 and is available in the
versions later.

I am trying the below on v 7.4 and it is working fine, without adding any
component on solrconfig.xml:


> http://localhost:8983/solr/collectio1/update/json/docs?processor=atomic&atomic.my_newfield=add&atomic.subject=set&atomic.count_i=inc&commit=true
> --data-binary {"id": 1,"title": "titleA"}
>

The Javadocs
<https://lucene.apache.org/solr/7_5_0//solr-core/org/apache/solr/update/processor/AtomicUpdateProcessorFactory.html>
are broken and I am working on fixing it.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Mon, Oct 29, 2018 at 7:26 PM Alexandre Rafalovitch 
wrote:

> I am not sure. I haven't tried this particular path. Your original
> question was without using SolrJ. Maybe others have.
>
> However, I am also not sure how much sense this makes. This Atomic
> processor is to make it easier to do the merge when you cannot modify
> the source documents. But if you are already doing it from SolrJ, you
> could do an update just as easily as trying the atomic approach.
>
> Regards,
>Alex.
> On Mon, 29 Oct 2018 at 09:40, Zahra Aminolroaya 
> wrote:
> >
> > Thanks Alex. I want to have a query for atomic update with solrj like
> below:
> >
> >
> http://localhost:8983/solr/test4/update?preprocessor=atomic&atomic.text2=set&atomic.text=set&atomic.text3=set&commit=true&stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E11%3C/field%3E%3Cfield%20name=%22text3%22%20update=%22set%22%3Ehi%3C/field%3E%3C/doc%3E%3C/add%3E
> >
> >
> > First, in solrj, I used "setfield" instead of "addfield" like
> > doc.setField("text3", "hi");
> >
> >
> > Then, I added ModifiableSolrParams :
> >
> >
> > ModifiableSolrParams add = new ModifiableSolrParams()
> > .add("processor", "atomic")
> > .add("atomic.text", "set")
> > .add("atomic.text2", "set")
> > .add("atomic.text3", "set")
> > .add(UpdateParams.COMMIT, "true")
> > .add("commit","true");
> >
> > And then I updated my document:
> >
> > req.setParams(add);
> > req.setAction( UpdateRequest.ACTION.COMMIT,false,false );
> >  req.add(docs);
> >  UpdateResponse rsp = req.process( server );
> >
> >
> >
> >  However, I get "No such processor atomic"
> >
> >
> > As you see I set commit to true. What the problem is?
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Negative CDCR Queue Size?

2018-11-09 Thread Amrit Sarkar
Hi Webster,

The queue size "*-1*" suggests the target is not initialized, and you
should see a "WARN" in the logs suggesting something bad happened at the
respective target. I am also posting the source code for reference.

Any chance you can look for WARN in the logs or probably check at
respective source and target the CDCR is configured and was running ok?
without any manual intervention?

Also, you mentioned there are a number of intermittent issues with CDCR, I
see you have reported few Jiras. I will be grateful if you can report the
rest?

Code:

> for (CdcrReplicatorState state : replicatorManager.getReplicatorStates()) {
>   NamedList queueStats = new NamedList();
>   CdcrUpdateLog.CdcrLogReader logReader = state.getLogReader();
>   if (logReader == null) {
> String collectionName = 
> req.getCore().getCoreDescriptor().getCloudDescriptor().getCollectionName();
> String shard = 
> req.getCore().getCoreDescriptor().getCloudDescriptor().getShardId();
> log.warn("The log reader for target collection {} is not initialised @ 
> {}:{}",
> state.getTargetCollection(), collectionName, shard);
> queueStats.add(CdcrParams.QUEUE_SIZE, -1l);
>   } else {
> queueStats.add(CdcrParams.QUEUE_SIZE, 
> logReader.getNumberOfRemainingRecords());
>   }
>   queueStats.add(CdcrParams.LAST_TIMESTAMP, 
> state.getTimestampOfLastProcessedOperation());
>   if (hosts.get(state.getZkHost()) == null) {
> hosts.add(state.getZkHost(), new NamedList());
>   }
>   ((NamedList) hosts.get(state.getZkHost())).add(state.getTargetCollection(), 
> queueStats);
> }
> rsp.add(CdcrParams.QUEUES, hosts);
>
>
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Wed, Nov 7, 2018 at 12:47 AM Webster Homer <
webster.ho...@milliporesigma.com> wrote:

> I'm sorry I should have included that. We are running Solr 7.2. We use
> CDCR for almost all of our collections. We have experienced several
> intermittent problems with CDCR, this one seems to be new, at least I
> hadn't seen it before
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, November 06, 2018 12:36 PM
> To: solr-user 
> Subject: Re: Negative CDCR Queue Size?
>
> What version of Solr? CDCR has changed quite a bit in the 7x  code line so
> it's important to know the version.
>
> On Tue, Nov 6, 2018 at 10:32 AM Webster Homer <
> webster.ho...@milliporesigma.com> wrote:
> >
> > Several times I have noticed that the CDCR action=QUEUES will return a
> negative queueSize. When this happens we seem to be missing data in the
> target collection. How can this happen? What does a negative Queue size
> mean? The timestamp is an empty string.
> >
> > We have two targets for a source. One looks like this, with a negative
> > queue size
> > queues":
> > ["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-eco
> > m-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize
> > ",-1,"lastTimestamp",""]],
> >
> > The other is healthy
> > "ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom
> > -mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize"
> > ,246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]
> >
> > We are not seeing CDCR errors.
> >
> > What could cause this behavior?
>


Re: Bidirectional CDCR not working

2019-03-14 Thread Amrit Sarkar
Hi Arnold,

You need "cdcr-processor-chain" definitions in solrconfig.xml on both
clusters' collections. Both clusters need to act as source and target.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley 
wrote:

> Hi,
>
> I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But
> after setting up bidirectional cdcr configuration, I am not able to index a
> document.
>
> Following is the error that I am getting:
>
> Async exception during distributed update: Error from server at
> http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
> request:
> http://host1
>
> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain&update.distrib=TOLEADER&distrib.from=
> http://host2:8983/solr/techproducts_shard1_replica_n1&wt=javabin&version=2
> Remote error message: unknown UpdateRequestProcessorChain:
> cdcr-processor-chain
>
> Do you know why I might be getting this error?
>


Re: Solr 6.6.0 - Error: can not use FieldCache on multivalued field: categoryLevels

2018-02-26 Thread Amrit Sarkar
Vincenzo,

As I read the source code;  SchemaField.java

/**
 * Sanity checks that the properties of this field type are plausible
 * for a field that may be used to get a FieldCacheSource, throwing
 * an appropriate exception (including the field name) if it is not.
 * FieldType subclasses can choose to call this method in their
 * getValueSource implementation
 * @see FieldType#getValueSource
 */
public void checkFieldCacheSource() throws SolrException {
  if ( multiValued() ) {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
"can not use FieldCache on multivalued field: "
+ getName());
  }
  if (! hasDocValues() ) {
if ( ! ( indexed() && null != this.type.getUninversionType(this) ) ) {
  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
  "can not use FieldCache on a field w/o
docValues unless it is indexed and supports Uninversion: "
  + getName());
}
  }
}

Seems like FieldCache are not allowed to un-invert values for
multi-valued fields.

I can suspect the reason, multiple values will eat up more memory? Not
sure, someone else can weigh in.



Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Feb 26, 2018 at 7:37 PM, Vincenzo D'Amore 
wrote:

> Hi,
>
> while trying to run a group query on a multivalue field I received this
> error:
>
> can not use FieldCache on multivalued field:
>
> 
> 
>
> 
>   true
>   400
>   4
> 
> 
>   
> org.apache.solr.common.SolrException
> org.apache.solr.common.
> SolrException
>   
>   can not use FieldCache on multivalued field:
> categoryLevels
>   400
> 
> 
>
> I don't understand why this is happening.
>
> Do you know any way to work around this problem?
>
> Thanks in advance,
> Vincenzo
>
> --
> Vincenzo D'Amore
>


Re: Solr CDCR doesn't work if the authentication is enabled

2018-03-05 Thread Amrit Sarkar
Nice. Can you please post the details on the JIRA too if possible:
https://issues.apache.org/jira/browse/SOLR-11959 and we can probably put up
a small patch of adding this bit of information in official documentation.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Mar 5, 2018 at 8:11 PM, dimaf  wrote:

> To resolve the issue, I added names of Source node to /live_nodes of
> Target.
> https://stackoverflow.com/questions/48790621/solr-cdcr-doesnt-work-if-the-
> authentication-is-enabled
> <https://stackoverflow.com/questions/48790621/solr-cdcr-
> doesnt-work-if-the-authentication-is-enabled>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: CDCR Invalid Number on deletes

2018-03-07 Thread Amrit Sarkar
Hey Chris,

I figured a separate issue while working on CDCR which may relate to your
problem. Please see jira: *SOLR-12063*
<https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063>. This is a
bug got introduced when we supported the bidirectional approach where an
extra flag in tlog entry for cdcr is added.

This part of the code is messing up:
*UpdateLog.java.RecentUpdates::update()::*

switch (oper) {
  case UpdateLog.ADD:
  case UpdateLog.UPDATE_INPLACE:
  case UpdateLog.DELETE:
  case UpdateLog.DELETE_BY_QUERY:
Update update = new Update();
update.log = oldLog;
update.pointer = reader.position();
update.version = version;

if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) {
  update.previousVersion = (Long) entry.get(UpdateLog.PREV_VERSION_IDX);
}
updatesForLog.add(update);
updates.put(version, update);

if (oper == UpdateLog.DELETE_BY_QUERY) {
  deleteByQueryList.add(update);
} else if (oper == UpdateLog.DELETE) {
  deleteList.add(new DeleteUpdate(version,
(byte[])entry.get(entry.size()-1)));
}

break;

  case UpdateLog.COMMIT:
break;
  default:
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"Unknown Operation! " + oper);
}

deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size()-1)));

is expecting the last entry to be the payload, but everywhere in the
project, *pos:[2] *is the index for the payload, while the last entry in
source code is *boolean* in / after Solr 7.2, denoting update is cdcr
forwarded or typical. UpdateLog.java.RecentUpdates is used to in cdcr sync,
checkpoint operations and hence it is a legit bug, slipped the tests I
wrote.

The immediate fix patch is uploaded and I am awaiting feedback on that.
Meanwhile if it is possible for you to apply the patch, build the jar and
try it out, please do and let us know.

For, *SOLR-9394* <https://issues.apache.org/jira/browse/SOLR-9394>, if you
can comment on the JIRA and post the sample docs, solr logs, relevant
information, I can give it a thorough look.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Mar 7, 2018 at 1:35 AM, Chris Troullis  wrote:

> Hi all,
>
> We recently upgraded to Solr 7.2.0 as we saw that there were some CDCR bug
> fixes and features added that would finally let us be able to make use of
> it (bi-directional syncing was the big one). The first time we tried to
> implement we ran into all kinds of errors, but this time we were able to
> get it mostly working.
>
> The issue we seem to be having now is that any time a document is deleted
> via deleteById from a collection on the primary node, we are flooded with
> "Invalid Number" errors followed by a random sequence of characters when
> CDCR tries to sync the update to the backup site. This happens on all of
> our collections where our id fields are defined as longs (some of them the
> ids are compound keys and are strings).
>
> Here's a sample exception:
>
> org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error
> from server at http://ip/solr/collection_shard1_replica_n1: Invalid
> Number:  ]
> -s
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> directUpdate(CloudSolrClient.java:549)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> sendRequest(CloudSolrClient.java:1012)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:883)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:945)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(
> CloudSolrClient.java:816)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
> at
> org.apache.solr.handler.CdcrReplicator.sendRequest(
> CdcrReplicator.java:140)
> at
> org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:104)
> at
> org.apache.sol

Re: Solr 7.2.0 CDCR Issue with TLOG collections

2018-03-07 Thread Amrit Sarkar
Webster,

I updated the JIRA: *SOLR-12057
<https://issues.apache.org/jira/browse/SOLR-12057>, **CdcrUpdateProcessor*
has a hack, it enable *PEER_SYNC* to bypass the leader logic in
*DistributedUpdateProcessor.versionAdd,* which eventually ends up in
segments not getting created.

I wrote a very dirty patch which fixes the problem with basic tests to
prove it works. I will try to polish and finish this as soon as possible.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Mar 6, 2018 at 10:07 PM, Webster Homer 
wrote:

> seems that this is a bug in Solr
> https://issues.apache.org/jira/browse/SOLR-12057
>
> Hopefully it can be addressed soon!
>
> On Mon, Mar 5, 2018 at 4:14 PM, Webster Homer 
> wrote:
>
> > I noticed that the cdcr action=queues returns different results for the
> > target clouds. One target says that the  updateLogSynchronizer  is
> > stopped the other says started. Why? What does that mean. We don't
> > explicitly set that anywhere
> >
> >
> > {"responseHeader": {"status": 0,"QTime": 0},"queues": [],"tlogTotalSize":
> > 0,"tlogTotalCount": 0,"updateLogSynchronizer": "stopped"}
> >
> > and the other
> >
> > {"responseHeader": {"status": 0,"QTime": 0},"queues": [],"tlogTotalSize":
> > 22254206389,"tlogTotalCount": 2,"updateLogSynchronizer": "started"}
> >
> > The source is as follows:
> > {
> > "responseHeader": {
> > "status": 0,
> > "QTime": 5
> > },
> > "queues": [
> > "xxx-mzk01.sial.com:2181,xxx-mzk02.sial.com:2181,xxx-mzk03.
> > sial.com:2181/solr",
> > [
> > "b2b-catalog-material-180124T",
> > [
> > "queueSize",
> > 0,
> > "lastTimestamp",
> > "2018-02-28T18:34:39.704Z"
> > ]
> > ],
> > "yyy-mzk01.sial.com:2181,yyy-mzk02.sial.com:2181,yyy-mzk03.
> > sial.com:2181/solr",
> > [
> > "b2b-catalog-material-180124T",
> > [
> > "queueSize",
> > 0,
> > "lastTimestamp",
> > "2018-02-28T18:34:39.704Z"
> > ]
> > ]
> > ],
> > "tlogTotalSize": 1970848,
> > "tlogTotalCount": 1,
> > "updateLogSynchronizer": "stopped"
> > }
> >
> >
> > On Fri, Mar 2, 2018 at 5:05 PM, Webster Homer 
> > wrote:
> >
> >> It looks like the data is getting to the target servers. I see tlog
> files
> >> with the right timestamps. Looking at the timestamps on the documents in
> >> the collection none of the data appears to have been loaded.
> >> In the solr.log I see lots of /cdcr messages
> action=LASTPROCESSEDVERSION,
> >>  action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT
> >>
> >> no errors
> >>
> >> autoCommit is set to  6 I tried sending a commit explicitly no
> >> difference. cdcr is uploading data, but no new data appears in the
> >> collection.
> >>
> >> On Fri, Mar 2, 2018 at 1:39 PM, Webster Homer 
> >> wrote:
> >>
> >>> We have been having strange behavior with CDCR on Solr 7.2.0.
> >>>
> >>> We have a number of replicas which have identical schemas. We found
> that
> >>> TLOG replicas give much more consistent search results.
> >>>
> >>> We created a collection using TLOG replicas in our QA clouds.
> >>> We have a locally hosted solrcloud with 2 nodes, all our collections
> >>> have 2 shards. We use CDCR to replicate the collections from this
> >>> environment to 2 data centers hosted in Google cloud. This seems to
> work
> >>> fairly well for our collections with NRT replicas. However the new TLOG
> >>> collection has problems.
> >>>
> >>> The google cloud solrclusters have 4 nodes each (3 separate
> Zookeepers).
> >>> 2 shards per collection with 2 replicas per shard.
> >>>
> >>> We never see data show up in the cloud collections, but we do see tlog
> >>> files show up on the cloud servers. I can see that all of the servers
> have
> >>> cdcr started, buffers are disabled.
> >>> The cdcr source configuration is:
> >>>
> >>> "requestHandler":{"

Re: CDCR Invalid Number on deletes

2018-03-20 Thread Amrit Sarkar
Hi Chris,

Sorry I was off work for few days and didn't follow the conversation. The
link is directing me to
https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063. I think we
have fixed the issue stated by you in the jira, though the symptoms were
different than yours.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Mar 21, 2018 at 1:17 AM, Chris Troullis 
wrote:

> Nevermind I found itthe link you posted links me to SOLR-12036 instead
> of SOLR-12063 for some reason.
>
> On Tue, Mar 20, 2018 at 1:51 PM, Chris Troullis 
> wrote:
>
> > Hey Amrit,
> >
> > Did you happen to see my last reply?  Is SOLR-12036 the correct JIRA?
> >
> > Thanks,
> >
> > Chris
> >
> > On Wed, Mar 7, 2018 at 1:52 PM, Chris Troullis 
> > wrote:
> >
> >> Hey Amrit, thanks for the reply!
> >>
> >> I checked out SOLR-12036, but it doesn't look like it has to do with
> >> CDCR, and the patch that is attached doesn't look CDCR related. Are you
> >> sure that's the correct JIRA number?
> >>
> >> Thanks,
> >>
> >> Chris
> >>
> >> On Wed, Mar 7, 2018 at 11:21 AM, Amrit Sarkar 
> >> wrote:
> >>
> >>> Hey Chris,
> >>>
> >>> I figured a separate issue while working on CDCR which may relate to
> your
> >>> problem. Please see jira: *SOLR-12063*
> >>> <https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12063>. This
> >>> is a
> >>> bug got introduced when we supported the bidirectional approach where
> an
> >>> extra flag in tlog entry for cdcr is added.
> >>>
> >>> This part of the code is messing up:
> >>> *UpdateLog.java.RecentUpdates::update()::*
> >>>
> >>> switch (oper) {
> >>>   case UpdateLog.ADD:
> >>>   case UpdateLog.UPDATE_INPLACE:
> >>>   case UpdateLog.DELETE:
> >>>   case UpdateLog.DELETE_BY_QUERY:
> >>> Update update = new Update();
> >>> update.log = oldLog;
> >>> update.pointer = reader.position();
> >>> update.version = version;
> >>>
> >>> if (oper == UpdateLog.UPDATE_INPLACE && entry.size() == 5) {
> >>>   update.previousVersion = (Long) entry.get(UpdateLog.PREV_VERSI
> >>> ON_IDX);
> >>> }
> >>> updatesForLog.add(update);
> >>> updates.put(version, update);
> >>>
> >>> if (oper == UpdateLog.DELETE_BY_QUERY) {
> >>>   deleteByQueryList.add(update);
> >>> } else if (oper == UpdateLog.DELETE) {
> >>>   deleteList.add(new DeleteUpdate(version,
> >>> (byte[])entry.get(entry.size()-1)));
> >>> }
> >>>
> >>> break;
> >>>
> >>>   case UpdateLog.COMMIT:
> >>> break;
> >>>   default:
> >>> throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
> >>> "Unknown Operation! " + oper);
> >>> }
> >>>
> >>> deleteList.add(new DeleteUpdate(version, (byte[])entry.get(entry.size()
> >>> -1)));
> >>>
> >>> is expecting the last entry to be the payload, but everywhere in the
> >>> project, *pos:[2] *is the index for the payload, while the last entry
> in
> >>> source code is *boolean* in / after Solr 7.2, denoting update is cdcr
> >>> forwarded or typical. UpdateLog.java.RecentUpdates is used to in cdcr
> >>> sync,
> >>> checkpoint operations and hence it is a legit bug, slipped the tests I
> >>> wrote.
> >>>
> >>> The immediate fix patch is uploaded and I am awaiting feedback on that.
> >>> Meanwhile if it is possible for you to apply the patch, build the jar
> and
> >>> try it out, please do and let us know.
> >>>
> >>> For, *SOLR-9394* <https://issues.apache.org/jira/browse/SOLR-9394>, if
> >>> you
> >>> can comment on the JIRA and post the sample docs, solr logs, relevant
> >>> information, I can give it a thorough look.
> >>>
> >>> Amrit Sarkar
> >>> Search Engineer
> >>> Lucidworks, Inc.
> >>> 415-589-9269
> >>> www.lucidworks.com
> >>> Twitter http://twitter.com/lucidworks
> >&g

Re: CDCR performance issues

2018-03-23 Thread Amrit Sarkar
Hey Tom,

I'm also having issue with replicas in the target data center. It will go
> from recovering to down. And when one of my replicas go to down in the
> target data center, CDCR will no longer send updates from the source to
> the target.


Are you able to figure out the issue? As long as the leaders of each shard
in each collection is up and serving, CDCR shouldn't stop.

Sometimes we have to reindex a large chunk of our index (1M+ documents).
> What's the best way to handle this if the normal CDCR process won't be
> able to keep up? Manually trigger a bootstrap again? Or is there something
> else we can do?
>

That's one of the limitations of CDCR, it cannot handle bulk indexing,
preferable way to do is
* stop cdcr
* bulk index
* issue manual BOOTSTRAP (it is independent of stop and start cdcr)
* start cdcr

1. Is it accurate that updates are not actually batched in transit from the
> source to the target and instead each document is posted separately?


The batchsize and schedule regulate how many docs are sent across target.
This has more details:
https://lucene.apache.org/solr/guide/7_2/cdcr-config.html#the-replicator-element




Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Mar 13, 2018 at 12:21 AM, Tom Peters  wrote:

> I'm also having issue with replicas in the target data center. It will go
> from recovering to down. And when one of my replicas go to down in the
> target data center, CDCR will no longer send updates from the source to the
> target.
>
> > On Mar 12, 2018, at 9:24 AM, Tom Peters  wrote:
> >
> > Anyone have any thoughts on the questions I raised?
> >
> > I have another question related to CDCR:
> > Sometimes we have to reindex a large chunk of our index (1M+ documents).
> What's the best way to handle this if the normal CDCR process won't be able
> to keep up? Manually trigger a bootstrap again? Or is there something else
> we can do?
> >
> > Thanks.
> >
> >
> >
> >> On Mar 9, 2018, at 3:59 PM, Tom Peters  wrote:
> >>
> >> Thanks. This was helpful. I did some tcpdumps and I'm noticing that the
> requests to the target data center are not batched in any way. Each update
> comes in as an independent update. Some follow-up questions:
> >>
> >> 1. Is it accurate that updates are not actually batched in transit from
> the source to the target and instead each document is posted separately?
> >>
> >> 2. Are they done synchronously? I assume yes (since you wouldn't want
> operations applied out of order)
> >>
> >> 3. If they are done synchronously, and are not batched in any way, does
> that mean that the best performance I can expect would be roughly how long
> it takes to round-trip a single document? ie. If my average ping is 25ms,
> then I can expect a peak performance of roughly 40 ops/s.
> >>
> >> Thanks
> >>
> >>
> >>
> >>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] <
> daniel.da...@nih.gov> wrote:
> >>>
> >>> These are general guidelines, I've done loads of networking, but may
> be less familiar with SolrCloud  and CDCR architecture.  However, I know
> it's all TCP sockets, so general guidelines do apply.
> >>>
> >>> Check the round-trip time between the data centers using ping or TCP
> ping.   Throughput tests may be high, but if Solr has to wait for a
> response to a request before sending the next action, then just like any
> network protocol that does that, it will get slow.
> >>>
> >>> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also
> check whether some proxy/load balancer between data centers is causing it
> to be a single connection per operation.   That will *kill* performance.
>  Some proxies default to HTTP/1.0 (open, send request, server send
> response, close), and that will hurt.
> >>>
> >>> Why you should listen to me even without SolrCloud knowledge -
> checkout paper "Latency performance of SOAP Implementations".   Same
> distribution of skills - I knew TCP well, but Apache Axis 1.1 not so well.
>  I still improved response time of Apache Axis 1.1 by 250ms per call with
> 1-line of code.
> >>>
> >>> -Original Message-
> >>> From: Tom Peters [mailto:tpet...@synacor.com]
> >>> Sent: Wednesday, March 7, 2018 6:19 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: CDCR performance issues
> &

Re: CDCR performance issues

2018-03-23 Thread Amrit Sarkar
Susheel,

That is the correct behavior, "commit" operation is not propagated to
target and the documents will be visible in the target as per commit
strategy devised there.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Fri, Mar 23, 2018 at 6:02 PM, Susheel Kumar 
wrote:

> Just a simple check, if you go to source solr and index single document
> from Documents tab, then keep querying target solr for the same document.
> How long does it take the document to appear in target data center.  In our
> case, I can see document show up in target within 30 sec which is our soft
> commit time.
>
> Thanks,
> Susheel
>
> On Fri, Mar 23, 2018 at 8:16 AM, Amrit Sarkar 
> wrote:
>
> > Hey Tom,
> >
> > I'm also having issue with replicas in the target data center. It will go
> > > from recovering to down. And when one of my replicas go to down in the
> > > target data center, CDCR will no longer send updates from the source to
> > > the target.
> >
> >
> > Are you able to figure out the issue? As long as the leaders of each
> shard
> > in each collection is up and serving, CDCR shouldn't stop.
> >
> > Sometimes we have to reindex a large chunk of our index (1M+ documents).
> > > What's the best way to handle this if the normal CDCR process won't be
> > > able to keep up? Manually trigger a bootstrap again? Or is there
> > something
> > > else we can do?
> > >
> >
> > That's one of the limitations of CDCR, it cannot handle bulk indexing,
> > preferable way to do is
> > * stop cdcr
> > * bulk index
> > * issue manual BOOTSTRAP (it is independent of stop and start cdcr)
> > * start cdcr
> >
> > 1. Is it accurate that updates are not actually batched in transit from
> the
> > > source to the target and instead each document is posted separately?
> >
> >
> > The batchsize and schedule regulate how many docs are sent across target.
> > This has more details:
> > https://lucene.apache.org/solr/guide/7_2/cdcr-config.
> > html#the-replicator-element
> >
> >
> >
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Tue, Mar 13, 2018 at 12:21 AM, Tom Peters 
> wrote:
> >
> > > I'm also having issue with replicas in the target data center. It will
> go
> > > from recovering to down. And when one of my replicas go to down in the
> > > target data center, CDCR will no longer send updates from the source to
> > the
> > > target.
> > >
> > > > On Mar 12, 2018, at 9:24 AM, Tom Peters  wrote:
> > > >
> > > > Anyone have any thoughts on the questions I raised?
> > > >
> > > > I have another question related to CDCR:
> > > > Sometimes we have to reindex a large chunk of our index (1M+
> > documents).
> > > What's the best way to handle this if the normal CDCR process won't be
> > able
> > > to keep up? Manually trigger a bootstrap again? Or is there something
> > else
> > > we can do?
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > > >> On Mar 9, 2018, at 3:59 PM, Tom Peters  wrote:
> > > >>
> > > >> Thanks. This was helpful. I did some tcpdumps and I'm noticing that
> > the
> > > requests to the target data center are not batched in any way. Each
> > update
> > > comes in as an independent update. Some follow-up questions:
> > > >>
> > > >> 1. Is it accurate that updates are not actually batched in transit
> > from
> > > the source to the target and instead each document is posted
> separately?
> > > >>
> > > >> 2. Are they done synchronously? I assume yes (since you wouldn't
> want
> > > operations applied out of order)
> > > >>
> > > >> 3. If they are done synchronously, and are not batched in any way,
> > does
> > > that mean that the best performance I can expect would be roughly how
> > long
> > > it takes to round-trip a single document? ie. If my average ping is
> 25ms,
> > > then I can expect a

Re: solrcloud Auto-commit doesn't seem reliable

2018-03-23 Thread Amrit Sarkar
Elaino,

When you say commits not working, the solr logs not printing "commit"
messages? or documents are not appearing when we search.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Mar 22, 2018 at 4:05 AM, Elaine Cario  wrote:

> I'm just catching up on reading solr emails, so forgive me for being late
> to this dance
>
> I've just gone through a project to enable CDCR on our Solr, and I also
> experienced a small period of time where the commits on the source server
> just seemed to stop.  This was during a period of intense experimentation
> where I was mucking around with configurations, turning CDCR on/off, etc.
> At some point the commits stopped occurring, and it drove me nuts for a
> couple of days - tried everything - restarting Solr, reloading, turned
> buffering on, turned buffering off, etc.  I finally threw up my hands and
> rebooted the server out of desperation (it was a physical Linux box).
> Commits worked fine after that.  I don't know what caused the commits to
> stop, and why re-booting (and not just restarting Solr) caused them to work
> fine.
>
> Wondering if you ever found a solution to your situation?
>
>
>
> On Fri, Feb 16, 2018 at 2:44 PM, Webster Homer 
> wrote:
>
> > I meant to get back to this sooner.
> >
> > When I say I issued a commit I do issue it as
> collection/update?commit=true
> >
> > The soft commit interval is set to 3000, but I don't have a problem with
> > soft commits ( I think). I was responding
> >
> > I am concerned that some hard commits don't seem to happen, but I think
> > many commits do occur. I'd like suggestions on how to diagnose this, and
> > perhaps an idea of where to look. Typically I believe that issues like
> this
> > are from our configuration.
> >
> > Our indexing job is pretty simple, we send blocks of JSON to
> > /update/json. We have either re-index the whole collection,
> or
> > just apply updates. Typically we reindex the data once a week and delete
> > any records that are older than the last full index. This does lead to a
> > fair number of deleted records in the index especially if commits fail.
> > Most of our collections are not large between 2 and 3 million records.
> >
> > The collections are hosted in google cloud
> >
> > On Mon, Feb 12, 2018 at 5:00 PM, Erick Erickson  >
> > wrote:
> >
> > > bq: But if 3 seconds is aggressive what would be a  good value for soft
> > > commit?
> > >
> > > The usual answer is "as long as you can stand". All top-level caches
> are
> > > invalidated, autowarming is done etc. on each soft commit. That can be
> a
> > > lot of
> > > work and if your users are comfortable with docs not showing up for,
> > > say, 10 minutes
> > > then use 10 minutes. As always "it depends" here, the point is not to
> > > do unnecessary
> > > work if possible.
> > >
> > > bq: If a commit doesn't happen how would there ever be an index merge
> > > that would remove the deleted documents.
> > >
> > > Right, it wouldn't. It's a little more subtle than that though.
> > > Segments on various
> > > replicas will contain different docs, thus the term/doc statistics can
> be
> > > a bit
> > > different between multiple replicas. None of the stats will change
> > > until the commit
> > > though. You might try turning no distributed doc/term stats though.
> > >
> > > Your comments about PULL or TLOG replicas are well taken. However, even
> > > those
> > > won't be absolutely in sync since they'll replicate from the master at
> > > slightly
> > > different times and _could_ get slightly different segments _if_
> > > there's indexing
> > > going on. But let's say you stop indexing. After the next poll
> > > interval all the replicas
> > > will have identical characteristics and will score the docs the same.
> > >
> > > I don't have any signifiant wisdom to offer here, except this is really
> > the
> > > first time I've heard of this behavior. About all I can imagine is
> > > that _somehow_
> > > the soft commit interval is -1. When you say you "issue a commit" I'm
> > > assuming
> > > it's via collection/update?commit=true or some such which issues a
> 

Re: Does CDCR Bootstrap sync leaves replica's out of sync

2018-04-16 Thread Amrit Sarkar
Hi Susheel,

Pretty sure you are talking about this:
https://issues.apache.org/jira/browse/SOLR-11724

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Apr 16, 2018 at 11:35 PM, Susheel Kumar 
wrote:

> Does anybody know about known issue where CDCR bootstrap sync leaves the
> replica's on target cluster non touched/out of sync.
>
> After I stopped and restart CDCR, it builds my target leaders index but
> replica's on target cluster still showing old index / not modified.
>
>
> Thnx
>


Re: Weird transaction log behavior with CDCR

2018-04-17 Thread Amrit Sarkar
Chris,

After disabling the buffer on source, kind shut down all the nodes of
source cluster first and then start them again. The tlogs will be removed
accordingly. BTW CDCR doesn't abide by 100 numRecordsToKeep or 10 numTlogs.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Apr 17, 2018 at 8:58 PM, Susheel Kumar 
wrote:

> DISABLEBUFFER on source cluster would solve this problem.
>
> On Tue, Apr 17, 2018 at 9:29 AM, Chris Troullis 
> wrote:
>
> > Hi,
> >
> > We are attempting to use CDCR with solr 7.2.1 and are experiencing odd
> > behavior with transaction logs. My understanding is that by default, solr
> > will keep a maximum of 10 tlog files or 100 records in the tlogs. I
> assume
> > that with CDCR, the records will not be removed from the tlogs until it
> has
> > been confirmed that they have been replicated to the other cluster.
> > However, even when replication has finished and the CDCR queue sizes are
> 0,
> > we are still seeing large numbers (50+) and large sizes (over a GB) of
> > tlogs sitting on the nodes.
> >
> > We are hard committing once per minute.
> >
> > Doing a lot of reading on the mailing list, I see that a lot of people
> were
> > pointing to buffering being enabled as the cause for some of these
> > transaction log issues. However, we have disabled buffering on both the
> > source and target clusters, and are still seeing the issues.
> >
> > Also, while some of our indexes replicate very rapidly (millions of
> > documents in minutes), other smaller indexes are crawling. If we restart
> > CDCR on the nodes then it finishes almost instantly.
> >
> > Any thoughts on these behaviors?
> >
> > Thanks,
> >
> > Chris
> >
>


Re: CdcrReplicator Forwarder not working on some shards

2018-04-17 Thread Amrit Sarkar
Susheel,

At the time of core reload, logs must be complaining or atleast pointing to
some direction. Each leader of shard is responsible to spawn a threadpool
for cdcr replicator to get the data over.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Apr 17, 2018 at 9:04 PM, Susheel Kumar 
wrote:

> Hi,
>
> Has anyone gone thru this issue where few shard leaders are forwarding
> updates to their counterpart leaders in target cluster while some of the
> shards leaders are not forwarding the updates.
>
> on Solr 6.6,  4 of the shards logs I see below entries and their
> counterpart in target are getting updated but for other 4 shards I don't
> below entries and neither being replicated to target.
>
> Any suggestion on how / what can be done to start cdcr-replicator threads
> on other shards?
>
> 2018-04-17 15:26:38.394 INFO
> (cdcr-replicator-24-thread-6-processing-n:dc2prsrcvap0049.
> whc.dc02.us.adp:8080_solr)
> [   ] o.a.s.h.CdcrReplicator Forwarded 0 updates to target COLL
> 2018-04-17 15:26:39.394 INFO
> (cdcr-replicator-24-thread-7-processing-n:dc2prsrcvap0049.
> whc.dc02.us.adp:8080_solr)
> [   ] o.a.s.h.CdcrReplicator Forwarded 0 updates to target COLL
>
> Thanks
> Susheel
>


Re: Weird transaction log behavior with CDCR

2018-04-17 Thread Amrit Sarkar
Chris,

Try to index few dummy documents and analyse if the tlogs are getting
cleared or not. Ideally on the restart, it clears everything and keeps max
2 tlog per data folder.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Apr 17, 2018 at 11:52 PM, Chris Troullis 
wrote:

> Hi Amrit, thanks for the reply.
>
> I shut down all of the nodes on the source cluster after the buffer was
> disabled, and there was no change to the tlogs.
>
> On Tue, Apr 17, 2018 at 12:20 PM, Amrit Sarkar 
> wrote:
>
> > Chris,
> >
> > After disabling the buffer on source, kind shut down all the nodes of
> > source cluster first and then start them again. The tlogs will be removed
> > accordingly. BTW CDCR doesn't abide by 100 numRecordsToKeep or 10
> numTlogs.
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Tue, Apr 17, 2018 at 8:58 PM, Susheel Kumar 
> > wrote:
> >
> > > DISABLEBUFFER on source cluster would solve this problem.
> > >
> > > On Tue, Apr 17, 2018 at 9:29 AM, Chris Troullis 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > We are attempting to use CDCR with solr 7.2.1 and are experiencing
> odd
> > > > behavior with transaction logs. My understanding is that by default,
> > solr
> > > > will keep a maximum of 10 tlog files or 100 records in the tlogs. I
> > > assume
> > > > that with CDCR, the records will not be removed from the tlogs until
> it
> > > has
> > > > been confirmed that they have been replicated to the other cluster.
> > > > However, even when replication has finished and the CDCR queue sizes
> > are
> > > 0,
> > > > we are still seeing large numbers (50+) and large sizes (over a GB)
> of
> > > > tlogs sitting on the nodes.
> > > >
> > > > We are hard committing once per minute.
> > > >
> > > > Doing a lot of reading on the mailing list, I see that a lot of
> people
> > > were
> > > > pointing to buffering being enabled as the cause for some of these
> > > > transaction log issues. However, we have disabled buffering on both
> the
> > > > source and target clusters, and are still seeing the issues.
> > > >
> > > > Also, while some of our indexes replicate very rapidly (millions of
> > > > documents in minutes), other smaller indexes are crawling. If we
> > restart
> > > > CDCR on the nodes then it finishes almost instantly.
> > > >
> > > > Any thoughts on these behaviors?
> > > >
> > > > Thanks,
> > > >
> > > > Chris
> > > >
> > >
> >
>


Re: CDCR broken for Mixed Replica Collections

2018-04-25 Thread Amrit Sarkar
Webster,

I have patch uploaded to both Cdcr supporting Tlog:
https://issues.apache.org/jira/browse/SOLR-12057 and core not getting
failed while initializing for Pull type replicas:
https://issues.apache.org/jira/browse/SOLR-12071 and awaiting feedback from
open source community. The solution for pull type replicas can be designed
better, apart from that, if this is urgent need for you, please apply the
patches for your packages and probably give a shot. I will added extensive
tests for both the use-cases.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Apr 26, 2018 at 2:46 AM, Erick Erickson 
wrote:

> CDCR won't really ever make sense for PULL replicas since the PULL
> replicas have no tlog and don't do any indexing and can't ever become
> a leader seamlessly.
>
> As for plans to address TLOG replicas, patches are welcome if you have
> a need. That's really how open source works, people add functionality
> as they have use-cases they need to support and contribute them back.
> So far this isn't a high-demand topic.
>
> Best,
> Erick
>
> On Wed, Apr 25, 2018 at 8:03 AM, Webster Homer 
> wrote:
> > I was looking at SOLR-12057
> >
> > According to the comment on the ticket, CDCR can not work when a
> collection
> > has PULL Replicas. That seems like a MAJOR limitation to CDCR and PULL
> > Replicas. Is this likely to be addressed in the future?
> > CDCR currently is broken for TLOG replicas too.
> >
> > https://issues.apache.org/jira/browse/SOLR-12057?
> focusedCommentId=16391558&page=com.atlassian.jira.
> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16391558
> >
> > Thanks
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be
> > privileged or
> > otherwise protected from disclosure. If you are not the intended
> > recipient,
> > you must not copy this message or attachment or disclose the
> > contents to
> > any other person. If you have received this transmission in error,
> > please
> > notify the sender immediately and delete the message and any attachment
> >
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do
> > not accept liability for any omissions or errors in this
> > message which may
> > arise as a result of E-Mail-transmission or for damages
> > resulting from any
> > unauthorized changes of the content of this message and
> > any attachment thereto.
> > Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee
> > that this message is free of viruses and does
> > not accept liability for any
> > damages caused by any virus transmitted
> > therewith.
> >
> >
> >
> > Click http://www.emdgroup.com/disclaimer
> > <http://www.emdgroup.com/disclaimer> to access the
> > German, French, Spanish
> > and Portuguese versions of this disclaimer.
>


Re: CDCR broken for Mixed Replica Collections

2018-04-25 Thread Amrit Sarkar
Pardon, * I have added extensive tests for both the use-cases.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Apr 26, 2018 at 3:50 AM, Amrit Sarkar 
wrote:

> Webster,
>
> I have patch uploaded to both Cdcr supporting Tlog: https://issues.apache.
> org/jira/browse/SOLR-12057 and core not getting failed while initializing
> for Pull type replicas: https://issues.apache.org/jira/browse/SOLR-12071
> and awaiting feedback from open source community. The solution for pull
> type replicas can be designed better, apart from that, if this is urgent
> need for you, please apply the patches for your packages and probably give
> a shot. I will added extensive tests for both the use-cases.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Thu, Apr 26, 2018 at 2:46 AM, Erick Erickson 
> wrote:
>
>> CDCR won't really ever make sense for PULL replicas since the PULL
>> replicas have no tlog and don't do any indexing and can't ever become
>> a leader seamlessly.
>>
>> As for plans to address TLOG replicas, patches are welcome if you have
>> a need. That's really how open source works, people add functionality
>> as they have use-cases they need to support and contribute them back.
>> So far this isn't a high-demand topic.
>>
>> Best,
>> Erick
>>
>> On Wed, Apr 25, 2018 at 8:03 AM, Webster Homer 
>> wrote:
>> > I was looking at SOLR-12057
>> >
>> > According to the comment on the ticket, CDCR can not work when a
>> collection
>> > has PULL Replicas. That seems like a MAJOR limitation to CDCR and PULL
>> > Replicas. Is this likely to be addressed in the future?
>> > CDCR currently is broken for TLOG replicas too.
>> >
>> > https://issues.apache.org/jira/browse/SOLR-12057?focusedComm
>> entId=16391558&page=com.atlassian.jira.plugin.system.
>> issuetabpanels%3Acomment-tabpanel#comment-16391558
>> >
>> > Thanks
>> >
>> > --
>> >
>> >
>> > This message and any attachment are confidential and may be
>> > privileged or
>> > otherwise protected from disclosure. If you are not the intended
>> > recipient,
>> > you must not copy this message or attachment or disclose the
>> > contents to
>> > any other person. If you have received this transmission in error,
>> > please
>> > notify the sender immediately and delete the message and any attachment
>> >
>> > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do
>> > not accept liability for any omissions or errors in this
>> > message which may
>> > arise as a result of E-Mail-transmission or for damages
>> > resulting from any
>> > unauthorized changes of the content of this message and
>> > any attachment thereto.
>> > Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not guarantee
>> > that this message is free of viruses and does
>> > not accept liability for any
>> > damages caused by any virus transmitted
>> > therewith.
>> >
>> >
>> >
>> > Click http://www.emdgroup.com/disclaimer
>> > <http://www.emdgroup.com/disclaimer> to access the
>> > German, French, Spanish
>> > and Portuguese versions of this disclaimer.
>>
>
>


Re: CDCR traffic

2018-06-25 Thread Amrit Sarkar
Hi Rajeswari,

No it is not. Source forwards the update to the Target in classic manner.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Fri, Jun 22, 2018 at 11:38 PM, Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Hi,
>
> Would like to know , if the CDCR traffic is encrypted.
>
> Thanks
> Ra
>


Re: tlogs not deleting

2018-06-25 Thread Amrit Sarkar
Brian,

If you are still facing the issue after disabling buffer, kindly shut down
all the nodes at source and then start them again, stale tlogs will start
purging themselves.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Jun 20, 2018 at 8:15 PM, Susheel Kumar 
wrote:

> Not in my knowledge.  Please double check or wait for some time but after
> DISABLEBUFFER on source, your logs should start rolling and its the exact
> same issue I have faced with 6.6 which you resolve by DISABLEBUFFER.
>
> On Tue, Jun 19, 2018 at 1:39 PM, Brian Yee  wrote:
>
> > Does anyone have any additional possible causes for this issue? I checked
> > the buffer status using "/cdcr?action=STATUS" and it says buffer disabled
> > at both target and source.
> >
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Tuesday, June 19, 2018 11:55 AM
> > To: solr-user 
> > Subject: Re: tlogs not deleting
> >
> > bq. Do you recommend disabling the buffer on the source SolrCloud as
> well?
> >
> > Disable them all on both source and target IMO.
> >
> > On Tue, Jun 19, 2018 at 8:50 AM, Brian Yee  wrote:
> > > Thank you Erick. I am running Solr 6.6. From the documentation:
> > > "Replicas do not need to buffer updates, and it is recommended to
> > disable buffer on the target SolrCloud."
> > >
> > > Do you recommend disabling the buffer on the source SolrCloud as well?
> > It looks like I already have the buffer disabled at target locations but
> > not the source location. Would it even make sense at the source location?
> > >
> > > This is what I have at the target locations:
> > > 
> > >   
> > >   100
> > >   
> > >   
> > > disabled
> > >   
> > > 
> > >
> > >
> > > -Original Message-
> > > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > > Sent: Tuesday, June 19, 2018 11:00 AM
> > > To: solr-user 
> > > Subject: Re: tlogs not deleting
> > >
> > > Take a look at the CDCR section of your reference guide, be sure you
> get
> > the version which you can download from here:
> > > https://archive.apache.org/dist/lucene/solr/ref-guide/
> > >
> > > There's the CDCR API call you can use for in-flight disabling, and
> > depending on the version of Solr you can set it in solrconfig.
> > >
> > > Basically, buffering was there in the original CDCR to allow a larger
> > maintenance window, you could enable buffering and all updates were saved
> > until you disabled it, during which period you could do whatever you
> needed
> > with your target cluster and not lose any updates.
> > >
> > > Later versions can do the full sync of the index and buffering is being
> > removed.
> > >
> > > Best,
> > > Erick
> > >
> > > On Tue, Jun 19, 2018 at 7:31 AM, Brian Yee  wrote:
> > >> Thanks for the suggestion. Can you please elaborate a little bit about
> > what DISABLEBUFFER does? The documentation is not very detailed. Is this
> > something that needs to be done manually whenever this problem happens or
> > is it something that we can do to fix it so it won't happen again?
> > >>
> > >> -Original Message-
> > >> From: Susheel Kumar [mailto:susheel2...@gmail.com]
> > >> Sent: Monday, June 18, 2018 9:12 PM
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Re: tlogs not deleting
> > >>
> > >> You may have to DISABLEBUFFER in source to get rid of tlogs.
> > >>
> > >> On Mon, Jun 18, 2018 at 6:13 PM, Brian Yee  wrote:
> > >>
> > >>> So I've read a bunch of stuff on hard/soft commits and tlogs. As I
> > >>> understand, after a hard commit, solr is supposed to delete old
> > >>> tlogs depending on the numRecordsToKeep and maxNumLogsToKeep values
> > >>> in the autocommit settings in solrconfig.xml. I am occasionally
> > >>> seeing solr fail to do this and the tlogs just build up over time
> > >>> and eventually we run out of disk space on the VM and this causes
> > problems for us.
> > >>> This does not happen all the time, only sometimes. I currently have
> > >>> a tlog directory that has 123G worth of tlogs. The last hard commit
> > >>> on this node was 10 minutes ago but these tlogs date back to 3 days
> > ago.
> > >>>
> > >>> We have sometimes found that restarting solr on the node will get it
> > >>> to clean up the old tlogs, but we really want to find the root cause
> > >>> and fix it if possible so we don't keep getting disk space alerts
> > >>> and have to adhoc restart nodes. Has anyone seen an issue like this
> > before?
> > >>>
> > >>> My update handler settings look like this:
> > >>>   
> > >>>
> > >>>   
> > >>>
> > >>>   ${solr.ulog.dir:}
> > >>>   ${solr.ulog.numVersionBuckets:
> > >>> 65536}
> > >>> 
> > >>> 
> > >>> 60
> > >>> 25
> > >>> false
> > >>> 
> > >>> 
> > >>> 12
> > >>> 
> > >>>
> > >>>   
> > >>> 100
> > >>>   
> > >>>
> > >>>   
> > >>>
> >
>


Re: CDCR Custom Document Routing

2018-07-02 Thread Amrit Sarkar
Jay,

Can you sample delete command you are firing at the source to understand
the issue with Cdcr.

On Tue, 3 Jul 2018, 4:22 am Jay Potharaju,  wrote:

> Hi
> The current cdcr setup does not work if my collection uses implicit
> routing.
> In my testing i found that adding documents works without any problems. It
> doesn't seem to work correctly when deleting documents.
> Is there an alternative to cdcr that would work in cross data center
> scenario.
>
> Setup:
> 8 shards : 2 on each node
> Solr:6.6.4
>
> Thanks
> Jay Potharaju
>


Re: CDCR traffic

2018-07-10 Thread Amrit Sarkar
Hi,

In the case of CDCR, assuming both the source and target clusters are SSL
> enabled, can we say that the source clusters’ shard leaders act as clients
> to the target cluster and hence the data is encrypted while its transmitted
> between the clusters?


Yes, that is correct. SSL and Kerberized cluster will have the
payload/updates encrypted. Thank you for pointing it out.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Jul 9, 2018 at 3:50 PM, Greenhorn Techie 
wrote:

> Amrit,
>
> Further to the below conversation:
>
> As I understand, Solr supports SSL encryption between nodes within a Solr
> cluster and as well communications to and from clients. In the case of
> CDCR, assuming both the source and target clusters are SSL enabled, can we
> say that the source clusters’ shard leaders act as clients to the target
> cluster and hence the data is encrypted while its transmitted between the
> clusters?
>
> Thanks
>
>
> On 25 June 2018 at 15:56:07, Amrit Sarkar (sarkaramr...@gmail.com) wrote:
>
> Hi Rajeswari,
>
> No it is not. Source forwards the update to the Target in classic manner.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Fri, Jun 22, 2018 at 11:38 PM, Natarajan, Rajeswari <
> rajeswari.natara...@sap.com> wrote:
>
> > Hi,
> >
> > Would like to know , if the CDCR traffic is encrypted.
> >
> > Thanks
> > Ra
> >
>
>


Anthill Inside and The Fifth Elephant Bengaluru India 2018 Edition

2018-07-23 Thread Amrit Sarkar
*Anthill Inside and The Fifth Elephant -- HasGeek’s marquee annual
conferences -- bring together business decision makers, data engineers,
architects, data scientists and product managers – to understand nuances of
managing and leveraging data. And what’s more, Solr community members can
avail a 10% discount on the conference tickets by visiting these
links!Anthill Inside: https://anthillinside.in/2018/?code=SG65IC
<https://anthillinside.in/2018/?code=SG65IC> The Fifth Elephant:
https://fifthelephant.in/2018/?code=SG65IC
<https://fifthelephant.in/2018/?code=SG65IC>Both conferences have been
produced by the community, for the community. They cover the theoretical
and practical applications of machine learning, deep learning and
artificial intelligence, and data collection and other implementation steps
towards building these systems.Anthill InsideAnthill Inside stitches the
gap between research and industry, bringing in speakers from both the
worlds in equal representation. Engage in nuanced, open discussions on
topics like privacy and ethics in AI to breaking down components of real
world systems into hubs-and-spokes. Hear about organizational issues like
what machine learning can and cannot do for your organization to deeper
technical issues like how to build classification systems in absence of
large datasets.Anthill Inside: 25 July Registration link with 10% discount
on conference: https://anthillinside.in/2018/?code=SG65IC
<https://anthillinside.in/2018/?code=SG65IC>The Fifth ElephantApplications
of techniques and uses of data to build product features is the primary
flavour of The Fifth Elephant 2018. The wide variety of topics include:1.
Designing systems for data (hint: it’s not only about the algorithms)a. How
poor design can lower data quality, which in turn will compromise the
entire project: a case study on AADHAAR b. How any data, even as meek as
electoral, can be weaponized against the users (voters in this case) 2.
Privacy issues with dataa. The right to be forgotten: problems with data
systems b. The right to privacy vs the right to information: way forward 3.
Handling super large scale data systems 4. Data visualization at scale,
like at Uber for self-driving carsAlong with talks at the venue, there are
open discussions on privacy and open data, and workshops on Amazon
SageMaker (26 July) and recommendations using TensorFlow (27 July).The
Fifth Elephant: 26 and 27 July Registration link with 10% discount on
conference: https://fifthelephant.in/2018/?code=SG65IC
<https://fifthelephant.in/2018/?code=SG65IC>For more details about any of
these, write to i...@hasgeek.com  or call 7676332020.*

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


Re: SolrCloud CDCR issue

2018-08-10 Thread Amrit Sarkar
To the concerned,

WARN : [c:collection_name s:shard2 r:core_node11
> x:collection_name_shard2_replica_n8]
> org.apache.solr.handler.CdcrRequestHandler; The log reader for target
> collection collection_name is not initialised @ collection_name:shard2
>

This means the source cluster was started first and then target. You need
to shut down all the nodes both at source and target. Get the targe nodes
up, all of them before starting the source ones. Logs will be initialized
positively.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Fri, Aug 3, 2018 at 11:33 PM cdatta  wrote:

> Any pointers?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


  1   2   >