Re: systemd definition for solr?

2020-10-16 Thread Joe Doupnik

    Close, but not quite there yet. The rules say use
        systemctl start (or stop or status) solr.service
That dot service part ought to be there. I suspect that if we omit it 
then we may be scolded on-screen and lose some grade points.
    On your error report below. Best to ensure that Solr is started by 
either /etc/init.d or systemd but not both. To check on the /etc/init.d 
part, go to /etc/init.d and give command chkconfig -l solr. if the 
result shows "On" for any run level then /etc/init.d is supposed to be 
in charge rather than systemd. If that were the case then your systemd 
control page ought to indicate that solr is a "LSB" process. On the 
other hand, if systemd were to be the controlling agent then ensure that 
the /etc/init.d part does not interfere by issuing command chkconfig -d 
solr  which will unlink solr from its to-do list. Then say systemctl 
enable solr    to let systemd take charge.
 Thus some busy work to check on things, and then making a choice 
of which flavour will be in charge.

    Thanks,
    Joe D.

On 15/10/2020 21:03, Ryan W wrote:

I didn't realize that to start a systemd service, I need to do...

systemctl start solr

...and not...

service solr start

Now the output from the status command looks a bit better, though still
with some problems...

[root@faspbsy0002 system]# systemctl status solr.service
? solr.service - LSB: A very fast and reliable search engine.
Loaded: loaded (/etc/rc.d/init.d/solr; bad; vendor preset: disabled)
Active: active (exited) since Thu 2020-10-15 15:58:23 EDT; 19s ago
  Docs: man:systemd-sysv-generator(8)
   Process: 34100 ExecStop=/etc/rc.d/init.d/solr stop (code=exited,
status=1/FAILURE)
   Process: 98871 ExecStart=/etc/rc.d/init.d/solr start (code=exited,
status=0/SUCCESS)



On Thu, Oct 15, 2020 at 3:24 PM Ryan W  wrote:


Does anyone have a simple systemd definition for a solr service?

The things I am finding on the internet don't work.  I am not sure if this
is the kind of thing where there might be some boilerplate that (usually)
works?  Or do situations vary so much that no boilerplate is possible?

Here is what I see when I try to use one of the definitions I found on the
internet:

[root@faspbsy0002 system]# systemctl status solr.service
? solr.service - LSB: A very fast and reliable search engine.
Loaded: loaded (/etc/rc.d/init.d/solr; bad; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2020-10-15 09:32:02 EDT;
5h 50min ago
  Docs: man:systemd-sysv-generator(8)
   Process: 34100 ExecStop=/etc/rc.d/init.d/solr stop (code=exited,
status=1/FAILURE)
   Process: 1337 ExecStart=/etc/rc.d/init.d/solr start (code=exited,
status=0/SUCCESS)

Oct 15 09:32:01 faspbsy0002 systemd[1]: Stopping LSB: A very fast and
reliab
Oct 15 09:32:01 faspbsy0002 su[34102]: (to solr) root on none
Oct 15 09:32:02 faspbsy0002 solr[34100]: No process found for Solr node
runn...3
Oct 15 09:32:02 faspbsy0002 systemd[1]: solr.service: control process
exited...1
Oct 15 09:32:02 faspbsy0002 systemd[1]: Stopped LSB: A very fast and
reliabl
Oct 15 09:32:02 faspbsy0002 systemd[1]: Unit solr.service entered failed
state.
Oct 15 09:32:02 faspbsy0002 systemd[1]: solr.service failed.
Warning: Journal has been rotated since unit was started. Log output is
incomplete or unavailable.
Hint: Some lines were ellipsized, use -l to show in full.





Very high disk read rate with an idle solr

2020-10-16 Thread uyilmaz


What can cause a very high (1G/s, which is the max our disks can provide) disk 
read rate that goes on for hours, with a Solr instance not being indexed or 
queried?

Last days our SolrCloud cluster stops responding to queries, today we tried 
stopping indexing and querying it, to find out what is going on. 2 collections 
seem to be in recovery, can recovery cause this behavior?

Regards and have a nice day

-- 
uyilmaz 


Filtering Parent documents based on Child documents Facets selection

2020-10-16 Thread Abhay Kumar
I have a nested documents which I am syncing in Solr :

{
   "id":"NCT04372953",
   "title":"Positive End-Expiratory Pressure (PEEP) Levels During Resuscitation 
of Preterm Infants at Birth (The POLAR Trial) ",
   "phase":"N/A",
   "status":"Not yet recruiting",
   "studytype":"Interventional",
   "SponsorName":[
  "Murdoch Childrens Research Institute|Children''s Hospital of 
Philadelphia|University of Amsterdam"
   ],
   "SponsorRole":[
  "lead|collaborator"
   ],
   "source":"Murdoch Childrens Research Institute",
   "sponsorrole":[
  "lead",
  "collaborator"
   ],
   "sponsorname":[
  "Murdoch Childrens Research Institute",
  "Children''s Hospital of Philadelphia",
  "University of Amsterdam"
   ],
   "investigatorsaffiliation":"",
   "investigatorname":[
  ""
   ],
   "therapeuticareaname":"",
   "text_suggest":[
  ""
   ],
   "investigatorrole":"",
   "_version_":1680437253090836480,
   "sites":{
  "id":"51002566",
  "facilitytype":"Hospital",
  "facilityname":"The Royal Women''s Hospital, Melbourne Australia",
  "facilitycountry":"Australia",
  "facilitystate":"Victoria",
  "facilitycity":"Parkville",
  "nodetype":"cnode",
  "facilityzip":"",
  "_nest_parent_":"NCT04372953",
  "phase":"",
  "studytype":"",
  "investigatorsaffiliation":"",
  "source":"",
  "title":"",
  "sponsorrole":[
 ""
  ],
  "investigatorname":[
 ""
  ],
  "therapeuticareaname":"",
  "text_suggest":[
 ""
  ],
  "investigatorrole":"",
  "sponsorname":[
 ""
  ],
  "status":"",
  "_version_":1680437253090836480
   },
   "investigators":[
  {
 "id":"6300662",
 "investigatorname":[
"Louise Owen"
 ],
 "nodetype":"cnode",
 "investigatorrole":"Principal Investigator",
 "investigatorsaffiliation":"The Royal Women''s Hospital, Melbourne 
Australia",
 "CongressScore":"",
 "TrialsScore":"Low",
 "PublicationScore":"",
 "_nest_parent_":"NCT04372953",
 "phase":"",
 "studytype":"",
 "source":"",
 "title":"",
 "sponsorrole":[
""
 ],
"therapeuticareaname":"",
 "text_suggest":[
""
 ],
 "sponsorname":[
""
 ],
 "status":"",
 "_version_":1680437253090836480
  },
  {
 "id":"6426782",
 "investigatorname":[
"David Tingay, MBBS FRACP"
 ],
 "nodetype":"cnode",
 "investigatorrole":"Study Chair",
 "investigatorsaffiliation":"Royal Children''s Hospital, Melbourne 
Australia",
 "CongressScore":"",
 "TrialsScore":"",
 "PublicationScore":"",
 "_nest_parent_":"NCT04372953",
 "phase":"",
 "studytype":"",
 "source":"",
 "title":"",
 "sponsorrole":[
""
 ],
 "therapeuticareaname":"",
 "text_suggest":[
""
 ],
 "sponsorname":[
""
 ],
 "status":"",
 "_version_":1680437253090836480
  },
  {
 "id":"7663364",
 "investigatorname":[
"Omar Kamlin"
 ],
 "nodetype":"cnode",
 "investigatorrole":"Principal Investigator",
 "investigatorsaffiliation":"The Royal Women''s Hospital, Melbourne 
Australia",
 "CongressScore":"",
 "TrialsScore":"Low",
 "PublicationScore":"",
 "_nest_parent_":"NCT04372953",
 "phase":"",
 "studytype":"",
 "source":"",
 "title":"",
 "sponsorrole":[
""
 ],
 "therapeuticareaname":"",
 "text_suggest":[
""
 ],
 "sponsorname":[
""
 ],
 "status":"",
 "_version_":1680437253090836480
  }
   ],
   "therapeuticareas":[
  {
 "id":"ta-0-NCT04372953",
 "therapeuticareaname":"Premature Birth",
 "text_prefixauto":"Premature Birth",
 "text_suggest":[
"Premature Birth"
 ],
 "diseaseareas":[
""
 ],
 "nodetype":"cnode",
 "_nest_parent_":"NCT04372953",
 "phase":"",
 "studytype":"",
 "investigatorsaffiliation":"",
 "source":"",
 "title":"",
 "sponsorrole":[
""
 ],
 "investigatorname":[
""
 ],
 "investigatorrole":"",
 "sponsorname":[
""
 ],
 "status":"",
 "_version_":1680437253090836480,
 "therapeuticareaname_facet":"Premature Birth",
 "diseaseareas_facet":[
""
 ]
  },
  {
 "id":"ta-1-NCT04372953",
 "therapeuticareaname":"Lung Injury",
 "text_prefixauto":"Lung Injury",
 "text_suggest":[

Backup fails despite allowPaths=* being set

2020-10-16 Thread Philipp Trulson
Hello everyone,

we are having problems with our backup script since we upgraded to Solr
8.6.2 on kubernetes. To be more precise the message is
*Path /data/backup/2020-10-16/collection must be relative to SOLR_HOME,
SOLR_DATA_HOME coreRootDirectory. Set system property 'solr.allowPaths' to
add other allowed paths.*

I executed the script by calling this endpoint
*curl
'http://solr.default.svc.cluster.local/solr/admin/collections?action=BACKUP&name=collection&collection=
*
collection*&location=/data/backup/2020-10-16&async=1114'*

The strange thing is that all 5 nodes are started with *-Dsolr.allowPaths=**,
so in theory it should work. The folder is an AWS EFS share, that's the
only reason I can imagine. Or can I check any other options?

Thank you for your help!
Philipp

-- 


 

reBuy reCommerce GmbH* · *Potsdamer Str. 188* · 
*10783 Berlin* · *Geschäftsführer: Dr. Philipp GattnerSitz und 
Registergericht: Berlin, Amtsgericht Charlottenburg, HRB 109344 B, 
*USt-ID-Nr.:* DE237458635


converting string to solr.TextField

2020-10-16 Thread yaswanth kumar
I am using solr 8.2

Can I change the schema fieldtype from string to solr.TextField
without indexing?



The reason is that string has only 32K char limit where as I am looking to
store more than 32K now.

The contents on this field doesn't require any analysis or tokenized but I
need this field in the queries and as well as output fields.

-- 
Thanks & Regards,
Yaswanth Kumar Konathala.
yaswanth...@gmail.com


Re: converting string to solr.TextField

2020-10-16 Thread Walter Underwood
No. The data is already indexed as a StringField.

You need to make a new field and reindex. If you want to 
keep the same field name, you need to delete all of the 
documents in the index, change the schema, and reindex.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 16, 2020, at 11:01 AM, yaswanth kumar  wrote:
> 
> I am using solr 8.2
> 
> Can I change the schema fieldtype from string to solr.TextField
> without indexing?
> 
>
> 
> The reason is that string has only 32K char limit where as I am looking to
> store more than 32K now.
> 
> The contents on this field doesn't require any analysis or tokenized but I
> need this field in the queries and as well as output fields.
> 
> -- 
> Thanks & Regards,
> Yaswanth Kumar Konathala.
> yaswanth...@gmail.com



Custom replica placement possible during collection restore

2020-10-16 Thread Kommu, Vinodh K.
Hi,

Would it be possible to restore a collection with replica placement into 
specific nodes/VMs in the cluster? I guess by default restore feature may not 
work in such custom way so by any chance can we modify those code details in 
collection_state.json file in backup directory to place replicas on specific 
nodes/VMs?

Perhaps it may not be recommended way but would like to know whether this 
approach will work at all?

Thanks & Regards,
Vinodh

DTCC DISCLAIMER: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify us 
immediately and delete the email and any attachments from your system. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. Message content created by DTCC is automatically 
secured using Transport Layer Security (TLS) encryption and will be encrypted 
and sent through a secure transmission connection if the recipient's system is 
configured to support TLS on the incoming email gateway. If there is no TLS 
configured or the encryption certificate is invalid on the recipient's system, 
the email communication will be sent through an unencrypted channel. 
Organizations communicating with DTCC should be using TLS v1.2 or newer to 
ensure continuation of encrypted communications. DTCC will not be responsible 
for any disclosure of private information or any related security incident 
resulting from an organization's inability to receive secure electronic 
communications through the current version of TLS.


Re: Info about legacyMode cluster property

2020-10-16 Thread yaswanth kumar
Can someone help on the above question?

On Thu, Oct 15, 2020 at 1:09 PM yaswanth kumar 
wrote:

> Can someone explain what are the implications when we change
> legacyMode=true on solr 8.2
>
> We have migrated from solr 5.5 to solr 8.2 everything worked great but
> when we are trying to add a core to existing collection with core api
> create it’s asking to pass the coreNodeName or switch legacyMode to true.
> When we switched it worked fine . But we need to understand on what are the
> cons because seems like this is false by default from solr 7
>
> Sent from my iPhone



-- 
Thanks & Regards,
Yaswanth Kumar Konathala.
yaswanth...@gmail.com


Re: converting string to solr.TextField

2020-10-16 Thread Alexandre Rafalovitch
Just as a side note,

> indexed="true"
If you are storing 32K message, you probably are not searching it as a
whole string. So, don't index it. You may also want to mark the field
as 'large' (and lazy):
https://lucene.apache.org/solr/guide/8_2/field-type-definitions-and-properties.html#field-default-properties

When you are going to make it a text field, you will probably be
having the same issues as well.

And honestly, if you are not storing those fields to search, maybe you
need to consider the architecture. Maybe those fields do not need to
be in Solr at all, but in external systems. Solr (or any search
system) should not be your system of records since - as the other
reply showed - some of the answers are "reindex everything".

Regards,
   Alex.

On Fri, 16 Oct 2020 at 14:02, yaswanth kumar  wrote:
>
> I am using solr 8.2
>
> Can I change the schema fieldtype from string to solr.TextField
> without indexing?
>
> 
>
> The reason is that string has only 32K char limit where as I am looking to
> store more than 32K now.
>
> The contents on this field doesn't require any analysis or tokenized but I
> need this field in the queries and as well as output fields.
>
> --
> Thanks & Regards,
> Yaswanth Kumar Konathala.
> yaswanth...@gmail.com


Re: converting string to solr.TextField

2020-10-16 Thread David Hastings
"If you want to
keep the same field name, you need to delete all of the
documents in the index, change the schema, and reindex."

actually doesnt re-indexing a document just delete/replace anyways assuming
the same id?

On Fri, Oct 16, 2020 at 3:07 PM Alexandre Rafalovitch 
wrote:

> Just as a side note,
>
> > indexed="true"
> If you are storing 32K message, you probably are not searching it as a
> whole string. So, don't index it. You may also want to mark the field
> as 'large' (and lazy):
>
> https://lucene.apache.org/solr/guide/8_2/field-type-definitions-and-properties.html#field-default-properties
>
> When you are going to make it a text field, you will probably be
> having the same issues as well.
>
> And honestly, if you are not storing those fields to search, maybe you
> need to consider the architecture. Maybe those fields do not need to
> be in Solr at all, but in external systems. Solr (or any search
> system) should not be your system of records since - as the other
> reply showed - some of the answers are "reindex everything".
>
> Regards,
>Alex.
>
> On Fri, 16 Oct 2020 at 14:02, yaswanth kumar 
> wrote:
> >
> > I am using solr 8.2
> >
> > Can I change the schema fieldtype from string to solr.TextField
> > without indexing?
> >
> >  stored="true"/>
> >
> > The reason is that string has only 32K char limit where as I am looking
> to
> > store more than 32K now.
> >
> > The contents on this field doesn't require any analysis or tokenized but
> I
> > need this field in the queries and as well as output fields.
> >
> > --
> > Thanks & Regards,
> > Yaswanth Kumar Konathala.
> > yaswanth...@gmail.com
>


Re: converting string to solr.TextField

2020-10-16 Thread Erick Erickson
Doesn’t re-indexing a document just delete/replace….

It’s complicated. For the individual document, yes. The problem
comes because the field is inconsistent _between_ documents, and
segment merging blows things up.

Consider. I have segment1 with documents indexed with the old
schema (String in this case). I  change my schema and index the same
field as a text type.

Eventually, a segment merge happens and these two segments get merged
into a single new segment. How should the field be handled? Should it
be defined as String or Text in the new segment? If you convert the docs
with a Text definition for the field to String,
you’d lose the ability to search for individual tokens. If you convert the
String to Text, you don’t have any guarantee that the information is even
available.

This is just the tip of the iceberg in terms of trying to change the 
definition of a field. Take the case of changing the analysis chain,
say you use a phonetic filter on a field then decide to remove it and
do not store the original. Erick might be encoded as “ENXY” so the 
original data is simply not there to convert. Ditto removing a 
stemmer, lowercasing, applying a regex, …...


From Mike McCandless:

"This really is the difference between an index and a database:
 we do not store, precisely, the original documents.  We store 
an efficient derived/computed index from them.  Yes, Solr/ES 
can add database-like behavior where they hold the true original 
source of the document and use that to rebuild Lucene indices 
over time.  But Lucene really is just a "search index" and we 
need to be free to make important improvements with time."

And all that aside, you have to re-index all the docs anyway or
your search results will be inconsistent. So leaving aside the 
impossible task of covering all the possibilities on the fly, it’s
better to plan on re-indexing….

Best,
Erick


> On Oct 16, 2020, at 3:16 PM, David Hastings  
> wrote:
> 
> "If you want to
> keep the same field name, you need to delete all of the
> documents in the index, change the schema, and reindex."
> 
> actually doesnt re-indexing a document just delete/replace anyways assuming
> the same id?
> 
> On Fri, Oct 16, 2020 at 3:07 PM Alexandre Rafalovitch 
> wrote:
> 
>> Just as a side note,
>> 
>>> indexed="true"
>> If you are storing 32K message, you probably are not searching it as a
>> whole string. So, don't index it. You may also want to mark the field
>> as 'large' (and lazy):
>> 
>> https://lucene.apache.org/solr/guide/8_2/field-type-definitions-and-properties.html#field-default-properties
>> 
>> When you are going to make it a text field, you will probably be
>> having the same issues as well.
>> 
>> And honestly, if you are not storing those fields to search, maybe you
>> need to consider the architecture. Maybe those fields do not need to
>> be in Solr at all, but in external systems. Solr (or any search
>> system) should not be your system of records since - as the other
>> reply showed - some of the answers are "reindex everything".
>> 
>> Regards,
>>   Alex.
>> 
>> On Fri, 16 Oct 2020 at 14:02, yaswanth kumar 
>> wrote:
>>> 
>>> I am using solr 8.2
>>> 
>>> Can I change the schema fieldtype from string to solr.TextField
>>> without indexing?
>>> 
>>>> stored="true"/>
>>> 
>>> The reason is that string has only 32K char limit where as I am looking
>> to
>>> store more than 32K now.
>>> 
>>> The contents on this field doesn't require any analysis or tokenized but
>> I
>>> need this field in the queries and as well as output fields.
>>> 
>>> --
>>> Thanks & Regards,
>>> Yaswanth Kumar Konathala.
>>> yaswanth...@gmail.com
>> 



Re: converting string to solr.TextField

2020-10-16 Thread David Hastings
Gotcha, thanks for the explanation.  another small question if you
dont mind, when deleting docs they arent actually removed, just tagged as
deleted, and the old field/field type is still in the index until
merged/optimized as well, wouldnt that cause almost the same conflicts
until then?

On Fri, Oct 16, 2020 at 3:51 PM Erick Erickson 
wrote:

> Doesn’t re-indexing a document just delete/replace….
>
> It’s complicated. For the individual document, yes. The problem
> comes because the field is inconsistent _between_ documents, and
> segment merging blows things up.
>
> Consider. I have segment1 with documents indexed with the old
> schema (String in this case). I  change my schema and index the same
> field as a text type.
>
> Eventually, a segment merge happens and these two segments get merged
> into a single new segment. How should the field be handled? Should it
> be defined as String or Text in the new segment? If you convert the docs
> with a Text definition for the field to String,
> you’d lose the ability to search for individual tokens. If you convert the
> String to Text, you don’t have any guarantee that the information is even
> available.
>
> This is just the tip of the iceberg in terms of trying to change the
> definition of a field. Take the case of changing the analysis chain,
> say you use a phonetic filter on a field then decide to remove it and
> do not store the original. Erick might be encoded as “ENXY” so the
> original data is simply not there to convert. Ditto removing a
> stemmer, lowercasing, applying a regex, …...
>
>
> From Mike McCandless:
>
> "This really is the difference between an index and a database:
>  we do not store, precisely, the original documents.  We store
> an efficient derived/computed index from them.  Yes, Solr/ES
> can add database-like behavior where they hold the true original
> source of the document and use that to rebuild Lucene indices
> over time.  But Lucene really is just a "search index" and we
> need to be free to make important improvements with time."
>
> And all that aside, you have to re-index all the docs anyway or
> your search results will be inconsistent. So leaving aside the
> impossible task of covering all the possibilities on the fly, it’s
> better to plan on re-indexing….
>
> Best,
> Erick
>
>
> > On Oct 16, 2020, at 3:16 PM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
> >
> > "If you want to
> > keep the same field name, you need to delete all of the
> > documents in the index, change the schema, and reindex."
> >
> > actually doesnt re-indexing a document just delete/replace anyways
> assuming
> > the same id?
> >
> > On Fri, Oct 16, 2020 at 3:07 PM Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> Just as a side note,
> >>
> >>> indexed="true"
> >> If you are storing 32K message, you probably are not searching it as a
> >> whole string. So, don't index it. You may also want to mark the field
> >> as 'large' (and lazy):
> >>
> >>
> https://lucene.apache.org/solr/guide/8_2/field-type-definitions-and-properties.html#field-default-properties
> >>
> >> When you are going to make it a text field, you will probably be
> >> having the same issues as well.
> >>
> >> And honestly, if you are not storing those fields to search, maybe you
> >> need to consider the architecture. Maybe those fields do not need to
> >> be in Solr at all, but in external systems. Solr (or any search
> >> system) should not be your system of records since - as the other
> >> reply showed - some of the answers are "reindex everything".
> >>
> >> Regards,
> >>   Alex.
> >>
> >> On Fri, 16 Oct 2020 at 14:02, yaswanth kumar 
> >> wrote:
> >>>
> >>> I am using solr 8.2
> >>>
> >>> Can I change the schema fieldtype from string to solr.TextField
> >>> without indexing?
> >>>
> >>> >> stored="true"/>
> >>>
> >>> The reason is that string has only 32K char limit where as I am looking
> >> to
> >>> store more than 32K now.
> >>>
> >>> The contents on this field doesn't require any analysis or tokenized
> but
> >> I
> >>> need this field in the queries and as well as output fields.
> >>>
> >>> --
> >>> Thanks & Regards,
> >>> Yaswanth Kumar Konathala.
> >>> yaswanth...@gmail.com
> >>
>
>


Re: Custom replica placement possible during collection restore

2020-10-16 Thread Sean Rasmussen
Hey Vinodh,

I’d have to check the backup/restore process. But I believe that the
state.json file does get exported. If that is the case then the nodesets
should be persisted.

Thanks,
Sean


On October 16, 2020 at 1:22:47 PM, Kommu, Vinodh K. (vko...@dtcc.com) wrote:

Hi,

Would it be possible to restore a collection with replica placement into
specific nodes/VMs in the cluster? I guess by default restore feature may
not work in such custom way so by any chance can we modify those code
details in collection_state.json file in backup directory to place replicas
on specific nodes/VMs?

Perhaps it may not be recommended way but would like to know whether this
approach will work at all?

Thanks & Regards,
Vinodh

DTCC DISCLAIMER: This email and any files transmitted with it are
confidential and intended solely for the use of the individual or entity to
whom they are addressed. If you have received this email in error, please
notify us immediately and delete the email and any attachments from your
system. The recipient should check this email and any attachments for the
presence of viruses. The company accepts no liability for any damage caused
by any virus transmitted by this email. Message content created by DTCC is
automatically secured using Transport Layer Security (TLS) encryption and
will be encrypted and sent through a secure transmission connection if the
recipient's system is configured to support TLS on the incoming email
gateway. If there is no TLS configured or the encryption certificate is
invalid on the recipient's system, the email communication will be sent
through an unencrypted channel. Organizations communicating with DTCC
should be using TLS v1.2 or newer to ensure continuation of encrypted
communications. DTCC will not be responsible for any disclosure of private
information or any related security incident resulting from an
organization's inability to receive secure electronic communications
through the current version of TLS.


Re: Info about legacyMode cluster property

2020-10-16 Thread Erick Erickson
You should not be using the core api to do anything with cores in SolrCloud.

True, under the covers the collections API uses the core API to do its tricks,
but you have to use it in a very precise manner.

As for legacyMode, don’t use it, please. it’s not supported any more, has
been completely removed in 9x.

Best,
Erick

> On Oct 16, 2020, at 2:38 PM, yaswanth kumar  wrote:
> 
> Can someone help on the above question?
> 
> On Thu, Oct 15, 2020 at 1:09 PM yaswanth kumar 
> wrote:
> 
>> Can someone explain what are the implications when we change
>> legacyMode=true on solr 8.2
>> 
>> We have migrated from solr 5.5 to solr 8.2 everything worked great but
>> when we are trying to add a core to existing collection with core api
>> create it’s asking to pass the coreNodeName or switch legacyMode to true.
>> When we switched it worked fine . But we need to understand on what are the
>> cons because seems like this is false by default from solr 7
>> 
>> Sent from my iPhone
> 
> 
> 
> -- 
> Thanks & Regards,
> Yaswanth Kumar Konathala.
> yaswanth...@gmail.com



Re: converting string to solr.TextField

2020-10-16 Thread Erick Erickson
Not sure what you’re asking here. re-indexing, as I was
using the term, means completely removing the index and
starting over. Or indexing to a new collection. At any
rate, starting from a state where there are _no_ segments.

I’m guessing you’re still thinking that re-indexing without
doing the above will work; it won’t. The way merging works,
it chooses segments based on a number of things, including
the percentage deleted documents. But there are still _other_
live docs in the segment.

Segment S1 has docs 1, 2, 3, 4 (old definition)
Segment S2 has docs 5, 6, 7, 8 (new definition)

Doc 2 is deleted, and S1 and S2 are merged into S3. The whole
discussion about not being able to do the right thing kicks in.
Should S3 use the new or old definition? Whichever one
it uses is wrong for the other segment. And remember,
Lucene simply _cannot_ “do the right thing” if the data
isn’t there.

What you may be missing is that a segment is a “mini-index”.
The underlying assumption is that all documents in that
segment are produced with the same schema and can be
accessed the same way. My comments about merging
“doing the right thing” is really about transforming docs
so all the docs can be treated the same. Which they can’t
if they were produced with different schemas.

Robert Muir’s statement is interesting here, built
on Mike McCandless’ comment:

"I think the key issue here is Lucene is an index not a database.
Because it is a lossy index and does not retain all of the user’s
data, its not possible to safely migrate some things automagically.
…. The function is y = f(x) and if x is not available its not 
possible, so lucene can't do it."

Don’t try to get around this. Prepare to
re-index the entire corpus into a new collection whenever
you change the schema and then maybe use an alias to
seamlessly convert from the user’s perspective. If you
simply cannot re-index from the system-of-record, you have
two choices:

1> use new collections whenever you need to change the
 schema and “somehow” have the app do different things
with the new and old collections

2> set stored=true for all your source fields (i.e. not
   copyField destination). You can either roll your own
   program that pulls data from the old and sends
   it to the new or use the Collections API REINDEXCOLLECTION
   API call. But note that it’s specifically called out
   in the docs that all fields must be stored to use the
API, what happens under the covers is that the 
 stored fields are read and sent to the target
   collection.

In both these cases, Robert’s comment doesn’t apply. Well,
it does apply but “if x is not available” is not the case,
the original _is_ available; it’s the stored data...

I’m over-stating the case somewhat, there are a few changes
that you can get away with re-indexing all the docs into an
existing index, things like changing from stored=true to 
stored=false, adding new fields, deleting fields (although the
meta-data for the field is still kept around) etc.

> On Oct 16, 2020, at 3:57 PM, David Hastings  
> wrote:
> 
> Gotcha, thanks for the explanation.  another small question if you
> dont mind, when deleting docs they arent actually removed, just tagged as
> deleted, and the old field/field type is still in the index until
> merged/optimized as well, wouldnt that cause almost the same conflicts
> until then?
> 
> On Fri, Oct 16, 2020 at 3:51 PM Erick Erickson 
> wrote:
> 
>> Doesn’t re-indexing a document just delete/replace….
>> 
>> It’s complicated. For the individual document, yes. The problem
>> comes because the field is inconsistent _between_ documents, and
>> segment merging blows things up.
>> 
>> Consider. I have segment1 with documents indexed with the old
>> schema (String in this case). I  change my schema and index the same
>> field as a text type.
>> 
>> Eventually, a segment merge happens and these two segments get merged
>> into a single new segment. How should the field be handled? Should it
>> be defined as String or Text in the new segment? If you convert the docs
>> with a Text definition for the field to String,
>> you’d lose the ability to search for individual tokens. If you convert the
>> String to Text, you don’t have any guarantee that the information is even
>> available.
>> 
>> This is just the tip of the iceberg in terms of trying to change the
>> definition of a field. Take the case of changing the analysis chain,
>> say you use a phonetic filter on a field then decide to remove it and
>> do not store the original. Erick might be encoded as “ENXY” so the
>> original data is simply not there to convert. Ditto removing a
>> stemmer, lowercasing, applying a regex, …...
>> 
>> 
>> From Mike McCandless:
>> 
>> "This really is the difference between an index and a database:
>> we do not store, precisely, the original documents.  We store
>> an efficient derived/computed index from them.  Yes, Solr/ES
>> can add database-like behavior where they hold the true original
>> source of t

Re: converting string to solr.TextField

2020-10-16 Thread David Hastings
sorry, i was thinking just using the
*:*
method for clearing the index would leave them still

On Fri, Oct 16, 2020 at 4:28 PM Erick Erickson 
wrote:

> Not sure what you’re asking here. re-indexing, as I was
> using the term, means completely removing the index and
> starting over. Or indexing to a new collection. At any
> rate, starting from a state where there are _no_ segments.
>
> I’m guessing you’re still thinking that re-indexing without
> doing the above will work; it won’t. The way merging works,
> it chooses segments based on a number of things, including
> the percentage deleted documents. But there are still _other_
> live docs in the segment.
>
> Segment S1 has docs 1, 2, 3, 4 (old definition)
> Segment S2 has docs 5, 6, 7, 8 (new definition)
>
> Doc 2 is deleted, and S1 and S2 are merged into S3. The whole
> discussion about not being able to do the right thing kicks in.
> Should S3 use the new or old definition? Whichever one
> it uses is wrong for the other segment. And remember,
> Lucene simply _cannot_ “do the right thing” if the data
> isn’t there.
>
> What you may be missing is that a segment is a “mini-index”.
> The underlying assumption is that all documents in that
> segment are produced with the same schema and can be
> accessed the same way. My comments about merging
> “doing the right thing” is really about transforming docs
> so all the docs can be treated the same. Which they can’t
> if they were produced with different schemas.
>
> Robert Muir’s statement is interesting here, built
> on Mike McCandless’ comment:
>
> "I think the key issue here is Lucene is an index not a database.
> Because it is a lossy index and does not retain all of the user’s
> data, its not possible to safely migrate some things automagically.
> …. The function is y = f(x) and if x is not available its not
> possible, so lucene can't do it."
>
> Don’t try to get around this. Prepare to
> re-index the entire corpus into a new collection whenever
> you change the schema and then maybe use an alias to
> seamlessly convert from the user’s perspective. If you
> simply cannot re-index from the system-of-record, you have
> two choices:
>
> 1> use new collections whenever you need to change the
>  schema and “somehow” have the app do different things
> with the new and old collections
>
> 2> set stored=true for all your source fields (i.e. not
>copyField destination). You can either roll your own
>program that pulls data from the old and sends
>it to the new or use the Collections API REINDEXCOLLECTION
>API call. But note that it’s specifically called out
>in the docs that all fields must be stored to use the
> API, what happens under the covers is that the
>  stored fields are read and sent to the target
>collection.
>
> In both these cases, Robert’s comment doesn’t apply. Well,
> it does apply but “if x is not available” is not the case,
> the original _is_ available; it’s the stored data...
>
> I’m over-stating the case somewhat, there are a few changes
> that you can get away with re-indexing all the docs into an
> existing index, things like changing from stored=true to
> stored=false, adding new fields, deleting fields (although the
> meta-data for the field is still kept around) etc.
>
> > On Oct 16, 2020, at 3:57 PM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
> >
> > Gotcha, thanks for the explanation.  another small question if you
> > dont mind, when deleting docs they arent actually removed, just tagged as
> > deleted, and the old field/field type is still in the index until
> > merged/optimized as well, wouldnt that cause almost the same conflicts
> > until then?
> >
> > On Fri, Oct 16, 2020 at 3:51 PM Erick Erickson 
> > wrote:
> >
> >> Doesn’t re-indexing a document just delete/replace….
> >>
> >> It’s complicated. For the individual document, yes. The problem
> >> comes because the field is inconsistent _between_ documents, and
> >> segment merging blows things up.
> >>
> >> Consider. I have segment1 with documents indexed with the old
> >> schema (String in this case). I  change my schema and index the same
> >> field as a text type.
> >>
> >> Eventually, a segment merge happens and these two segments get merged
> >> into a single new segment. How should the field be handled? Should it
> >> be defined as String or Text in the new segment? If you convert the docs
> >> with a Text definition for the field to String,
> >> you’d lose the ability to search for individual tokens. If you convert
> the
> >> String to Text, you don’t have any guarantee that the information is
> even
> >> available.
> >>
> >> This is just the tip of the iceberg in terms of trying to change the
> >> definition of a field. Take the case of changing the analysis chain,
> >> say you use a phonetic filter on a field then decide to remove it and
> >> do not store the original. Erick might be encoded as “ENXY” so the
> >> original data is simply not there to convert. Ditto remov

Re: converting string to solr.TextField

2020-10-16 Thread Walter Underwood
In addition, what happens at query time when documents have
been index under a varying field type? Well, it doesn’t work well.

The full set of steps for uninterrupted searching is:

1. Add the new text field.
2. Reindex to populate that.
3. Switch querying to use the new text field.
4. Change the old string field to indexed=“false” stored=“false” and/or stop
including that field in search updates and/or populating it with copyField.
5. Reindex again to clean up all occurrences of the old field.
6. Remove the old field from the schema.

I just finished this process on two big clusters in prod. We had
created a bunch of extra fields for a series of A/B tests on 
relevance improvements. Those tests were finished, so we 
needed to remove those from the index. It was slightly simpler
because we had already stopped querying those fields.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 16, 2020, at 12:57 PM, David Hastings  
> wrote:
> 
> Gotcha, thanks for the explanation.  another small question if you
> dont mind, when deleting docs they arent actually removed, just tagged as
> deleted, and the old field/field type is still in the index until
> merged/optimized as well, wouldnt that cause almost the same conflicts
> until then?
> 
> On Fri, Oct 16, 2020 at 3:51 PM Erick Erickson 
> wrote:
> 
>> Doesn’t re-indexing a document just delete/replace….
>> 
>> It’s complicated. For the individual document, yes. The problem
>> comes because the field is inconsistent _between_ documents, and
>> segment merging blows things up.
>> 
>> Consider. I have segment1 with documents indexed with the old
>> schema (String in this case). I  change my schema and index the same
>> field as a text type.
>> 
>> Eventually, a segment merge happens and these two segments get merged
>> into a single new segment. How should the field be handled? Should it
>> be defined as String or Text in the new segment? If you convert the docs
>> with a Text definition for the field to String,
>> you’d lose the ability to search for individual tokens. If you convert the
>> String to Text, you don’t have any guarantee that the information is even
>> available.
>> 
>> This is just the tip of the iceberg in terms of trying to change the
>> definition of a field. Take the case of changing the analysis chain,
>> say you use a phonetic filter on a field then decide to remove it and
>> do not store the original. Erick might be encoded as “ENXY” so the
>> original data is simply not there to convert. Ditto removing a
>> stemmer, lowercasing, applying a regex, …...
>> 
>> 
>> From Mike McCandless:
>> 
>> "This really is the difference between an index and a database:
>> we do not store, precisely, the original documents.  We store
>> an efficient derived/computed index from them.  Yes, Solr/ES
>> can add database-like behavior where they hold the true original
>> source of the document and use that to rebuild Lucene indices
>> over time.  But Lucene really is just a "search index" and we
>> need to be free to make important improvements with time."
>> 
>> And all that aside, you have to re-index all the docs anyway or
>> your search results will be inconsistent. So leaving aside the
>> impossible task of covering all the possibilities on the fly, it’s
>> better to plan on re-indexing….
>> 
>> Best,
>> Erick
>> 
>> 
>>> On Oct 16, 2020, at 3:16 PM, David Hastings <
>> hastings.recurs...@gmail.com> wrote:
>>> 
>>> "If you want to
>>> keep the same field name, you need to delete all of the
>>> documents in the index, change the schema, and reindex."
>>> 
>>> actually doesnt re-indexing a document just delete/replace anyways
>> assuming
>>> the same id?
>>> 
>>> On Fri, Oct 16, 2020 at 3:07 PM Alexandre Rafalovitch <
>> arafa...@gmail.com>
>>> wrote:
>>> 
 Just as a side note,
 
> indexed="true"
 If you are storing 32K message, you probably are not searching it as a
 whole string. So, don't index it. You may also want to mark the field
 as 'large' (and lazy):
 
 
>> https://lucene.apache.org/solr/guide/8_2/field-type-definitions-and-properties.html#field-default-properties
 
 When you are going to make it a text field, you will probably be
 having the same issues as well.
 
 And honestly, if you are not storing those fields to search, maybe you
 need to consider the architecture. Maybe those fields do not need to
 be in Solr at all, but in external systems. Solr (or any search
 system) should not be your system of records since - as the other
 reply showed - some of the answers are "reindex everything".
 
 Regards,
  Alex.
 
 On Fri, 16 Oct 2020 at 14:02, yaswanth kumar 
 wrote:
> 
> I am using solr 8.2
> 
> Can I change the schema fieldtype from string to solr.TextField
> without indexing?
> 
>   >>> stored="true"/>
> 
> The reason is that string has only 32K char limit where as I am l

Re: converting string to solr.TextField

2020-10-16 Thread Shawn Heisey

On 10/16/2020 2:36 PM, David Hastings wrote:

sorry, i was thinking just using the
*:*
method for clearing the index would leave them still


In theory, if you delete all documents at the Solr level, Lucene will 
delete all the segment files on the next commit, because they are empty. 
 I have not confirmed with testing whether this actually happens.


It is far safer to use a new index as Erick has said, or to delete the 
index directories completely and restart Solr ... so you KNOW the index 
has nothing in it.


Thanks,
Shawn


Re: Need urgent help -- High cpu on solr

2020-10-16 Thread Rahul Goswami
In addition to the insightful pointers by Zisis and Erick, I would like to
mention an approach in the link below that I generally use to pinpoint
exactly which threads are causing the CPU spike. Knowing this you can
understand which aspect of Solr (search thread, GC, update thread etc) is
taking more CPU and develop a mitigation strategy accordingly. (eg: if it's
a GC thread, maybe try tuning the params or switch to G1 GC). Just helps to
take the guesswork out of the many possible causes. Of course the
suggestions received earlier are best practices and should be taken into
consideration nevertheless.

https://backstage.forgerock.com/knowledge/kb/article/a39551500

The hex number the author talks about in the link above is the native
thread id.

Best,
Rahul


On Wed, Oct 14, 2020 at 8:00 AM Erick Erickson 
wrote:

> Zisis makes good points. One other thing is I’d look to
> see if the CPU spikes coincide with commits. But GC
> is where I’d look first.
>
> Continuing on with the theme of caches, yours are far too large
> at first glance. The default is, indeed, size=512. Every time
> you open a new searcher, you’ll be executing 128 queries
> for autowarming the filterCache and another 128 for the queryResultCache.
> autowarming alone might be accounting for it. I’d reduce
> the size back to 512 and an autowarm count nearer 16
> and monitor the cache hit ratio. There’s little or no benefit
> in squeezing the last few percent from the hit ratio. If your
> hit ratio is small even with the settings you have, then your caches
> don’t do you much good anyway so I’d make them much smaller.
>
> You haven’t told us how often your indexes are
> updated, which will be significant CPU hit due to
> your autowarming.
>
> Once you’re done with that, I’d then try reducing the heap. Most
> of the actual searching is done in Lucene via MMapDirectory,
> which resides in the OS memory space. See:
>
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Finally, if it is GC, consider G1GC if you’re not using that
> already.
>
> Best,
> Erick
>
>
> > On Oct 14, 2020, at 7:37 AM, Zisis T.  wrote:
> >
> > The values you have for the caches and the maxwarmingsearchers do not
> look
> > like the default. Cache sizes are 512 for the most part and
> > maxwarmingsearchers are 2 (if not limit them to 2)
> >
> > Sudden CPU spikes probably indicate GC issues. The #  of documents you
> have
> > is small, are they huge documents? The # of collections is OK in general
> but
> > since they are crammed in 5 Solr nodes the memory requirements might be
> > bigger. Especially if filter and the other caches get populated with 50K
> > entries.
> >
> > I'd first go through the GC activity to make sure that this is not
> causing
> > the issue. The fact that you lose some Solr servers is also an indicator
> of
> > large GC pauses that might create a problem when Solr communicates with
> > Zookeeper.
> >
> >
> >
> > --
> > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
>


Improve results/relevance

2020-10-16 Thread Jayadevan Maymala
Hi all,

We have a catalogue of many products, including smart phones.  We use
*edismax* query parser. If someone types in iPhone 11, we are getting the
correct results. But iPhone 11 Pro is coming before iPhone 11. What options
can be used to improve this?

Regards,
Jayadevan