Re: support Rich Document

2021-02-10 Thread Jörn Franke
You can store them on the filesystem and a link to them in Solr. Your search application could fetch them from the filesystem and serve them to the users. Alternatively serve them as WebDAV, SharePoint or whatever your organization sets as standard. It does not make sense to store them in Solr

Re: SSL using CloudSolrClient

2021-02-03 Thread Jörn Franke
Only between Solr nodes PKIauthentication works > Am 03.02.2021 um 21:27 schrieb Jörn Franke : > > SSL is transport security. For authentication you have to use basic or > kerberos or Hadoop. You may also need to configure authorisation > >> Am 03.02.2021 um 21:22

Re: SSL using CloudSolrClient

2021-02-03 Thread Jörn Franke
SSL is transport security. For authentication you have to use basic or kerberos or Hadoop. You may also need to configure authorisation > Am 03.02.2021 um 21:22 schrieb ChienHuaWang : > > Hi, > > I am implementing SSL between Solr and Client communication. The clients > connect to Solr via Clo

Re: SOLR 8.6.0 date Indexing Issues.

2020-11-20 Thread Jörn Franke
Your should format the date according to the ISO Standard: https://lucene.apache.org/solr/guide/6_6/working-with-dates.html Eg. 2018-07-12T00:00:00Z You can either transform the date that you have in Solr or in your client pushing the doc to Solr. All major programming language have date util

Re: Solr8.7 How to increase JVM-Memory ?

2020-11-18 Thread Jörn Franke
--Message d'origine- > De : Jörn Franke [mailto:jornfra...@gmail.com] > Envoyé : mercredi 18 novembre 2020 16:41 > À : solr-user@lucene.apache.org > Objet : Re: Solr8.7 How to increase JVM-Memory ? > > Did you make solr.in.sh executable ? Eg chmod a+x solr.in.sh ?

Re: Solr8.7 How to increase JVM-Memory ?

2020-11-18 Thread Jörn Franke
Did you make solr.in.sh executable ? Eg chmod a+x solr.in.sh ? > Am 18.11.2020 um 16:33 schrieb Matheo Software : > >  > Hi All, > > Since several years I work with a old version of Solr on Ubuntu, version 5.4. > Today I test the 8.7 version. > But I’m not able to change the JVM-Memory like in

Re: Unable to upload configuration with upconfig (Unable to read additional data from server)

2020-11-16 Thread Jörn Franke
I recommend to use the configset api : https://lucene.apache.org/solr/guide/8_6/configsets-api.html Especially if you need to secure ZK access with authentication / authorization which is recommended > Am 16.11.2020 um 11:18 schrieb Maehr, Bernhard : > > Hello guys, > > I have set up a ku

Re: Solr endpoint on the public internet

2020-10-08 Thread Jörn Franke
It is like opening a database to the Internet - you simply don’t do it and I don’t recommend it. If you despite the anti pattern want to do it use the latest Solr versions and put a reverse proxy in front. Always use authentication and authorization. Do only allow a minimal API endpoints and

Re: Fetched but not Added Solr 8.6.2

2020-09-17 Thread Jörn Franke
Log file will tell you the issue. > Am 17.09.2020 um 10:54 schrieb Anuj Bhargava : > > We just installed Solr 8.6.2 > It is fetching the data but not adding > > Indexing completed. *Added/Updated: 0 *documents. Deleted 0 documents. > (Duration: 06s) > Requests: 1 ,* Fetched: 100* 17/s, Skipped:

Re: Solr Cloud 8.5.1 - HDFS and Erasure Coding

2020-09-16 Thread Jörn Franke
I am not aware of a test. However keep In mind that HDFS supported will be deprecated. Additionally - you can configure erasure encoding in HDFS on a per folder / file basis so you could in the worst case just make the folder for Solr with the standard HDFS mode. Erasure encoding has several li

Re: Updating configset

2020-09-11 Thread Jörn Franke
I would go for the Solr rest api ... especially if you have a secured zk (eg with Kerberos). Then you need to manage access for humans only in Solr and not also in ZK. > Am 11.09.2020 um 19:41 schrieb Erick Erickson : > > Bin/solr zk upconfig... > Bin/solr zk cp... For individual files. > > N

Re: Solr Schema API seems broken to me after 8.2.0

2020-09-08 Thread Jörn Franke
Can you check the logfiles of Solr? It could be that after the upgrade some filesystem permissions do not work anymore > Am 08.09.2020 um 09:27 schrieb "jeanc...@gmail.com" : > > Hey guys, good morning. > > As I didn't get any reply for this one, is it ok then that I create the > Jira ticket

Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

2020-08-28 Thread Jörn Franke
Maybe this can help you? https://lucene.apache.org/solr/guide/7_5/distributed-requests.html#configuring-statscache-distributed-idf On Mon, May 11, 2020 at 9:24 AM Spyros Kapnissis wrote: > HI all, > > On our current master/slave setup (no cloud), we use a a custom sorting > function to get the f

Re: Solr collections gets wiped on restart

2020-08-27 Thread Jörn Franke
Any logfiles after restart? Which Solr version? I would activate autopurge in Zookeeper > Am 27.08.2020 um 10:49 schrieb "antonio.di...@bnpparibasfortis.com" > : > > Good morning, > > > I would like to get some help if possible. > > > > We have a 3 node Solr cluster (ensemble) with apach

Re: Real time index data

2020-08-26 Thread Jörn Franke
. > Am 26.08.2020 um 11:36 schrieb Jörn Franke : > > You do not provide many details, but a queuing mechanism seems to be > appropriate for this use case. > >> Am 26.08.2020 um 11:30 schrieb Tushar Arora : >> >> Hi, >> >> One of our use cases requir

Re: Real time index data

2020-08-26 Thread Jörn Franke
You do not provide many details, but a queuing mechanism seems to be appropriate for this use case. > Am 26.08.2020 um 11:30 schrieb Tushar Arora : > > Hi, > > One of our use cases requires real time indexing of data in solr from DB. > Approximately, 30 rows are updated in a second in DB. And

Re: SOLR Compatibility with Oracle Enterprise Linux 7

2020-08-24 Thread Jörn Franke
Yes, it should be no issues to upgrade to RHEL7. I assume you mean Solr 8.4.0. You can also use the latest Solr version. Why not RHEL8? > Am 24.08.2020 um 09:02 schrieb Wang, Ke : >

Re: Kerberos on windows device

2020-08-21 Thread Jörn Franke
Hi, You can use ktpass, if you are AD Administrator. The security json does not change from Linux. Please note that there are a lot of things to consider with Kerberos that can go wrong which is not a Solr issue but Kerberos complexity (eg correct DNS names, correct encryption type selected in

Re: Trailing space issue with indexed data.

2020-08-18 Thread Jörn Franke
During indexing. Do they matter for search, ie would the search be different with/without them? > Am 18.08.2020 um 19:57 schrieb Fiz N : > > Hell SOLR Experts, > > I am using SOLR 8.6 and indexing data from MSSQL DB. > > after indexing is done I am seeing > > “Page_number”:”1

Re: SOLR indexing takes longer time

2020-08-17 Thread Jörn Franke
The DIH is single threaded and deprecated. Your best bet is to have a script/program extracting data from MongoDB and write them to Solr in Batches using multiple threads. You will see a significant higher performance for your data. > Am 17.08.2020 um 20:23 schrieb Abhijit Pawar : > > Hello,

Re: DIH on SolrCloud

2020-08-13 Thread Jörn Franke
DIH is deprecated in current Solr versions. The general recommendation is to do processing outside the Solr server and use the update handler (the normal one, not Cell) to add documents to the index. So you should avoid using it as it is not future proof . If you need more Time to migrate to a

Re: Replicas in Recovery During Atomic Updates

2020-08-10 Thread Jörn Franke
How do you ingest it exactly with Atomtic updates ? Is there an update processor in-between? What are your settings for hard/soft commit? For the shared going to recovery - do you have a log entry or something ? What is the Solr version? How do you setup ZK? > Am 10.08.2020 um 16:24 schrieb

Re: Solr + Parquets

2020-08-07 Thread Jörn Franke
DIH is deprecated and it will be removed from Solr. You may though still be able to install it as a plug-in. However, AFAIK nobody maintains it. Do not use it anymore You can write a custom Spark data source that writes to Solr or does it in a spark Map step using SolrJ . In both cases do not c

Re: Solrj client 8.6.0 issue special characters in query

2020-08-07 Thread Jörn Franke
Hmm, setting -Dfile.encoding=UTF-8 solves the problem. I have to now check which component of the application screws it up, but at the moment I do NOT believe it is related to Solrj. On Fri, Aug 7, 2020 at 11:53 AM Jörn Franke wrote: > Dear all, > > I have the following issues. I hav

Solrj client 8.6.0 issue special characters in query

2020-08-07 Thread Jörn Franke
Dear all, I have the following issues. I have a Solrj Client 8.6 (but it happens also in previous versions), where I execute, for example, the following query: Jörn If I look into Solr Admin UI it finds all the right results. If I use Solrj client then it does not find anything. Further, investi

Re: Searching for credit card numbers

2020-07-28 Thread Jörn Franke
A regex search at query time would leave room for attacks (eg a regex can easily be designed to block the Solr server forever). If the field is store you can also try to use a cursor to go through all entries using a cursor and reindex the doc based on the field: https://lucene.apache.org/solr/

Re: Meow attacks

2020-07-28 Thread Jörn Franke
In Addition what has been said before (use private networks/firewall rules) - activate Kerberos authentication so that only Solr hosts can write to Zk (the Solr client needs no write access) and use encryption where possible. Upgrade Solr to the latest version, use ssl , enable Kerberos, have cl

Re: Cannot read ZK Kerberos conf when enabling java security manager on 8.6

2020-07-21 Thread Jörn Franke
Jira created > Am 21.07.2020 um 10:28 schrieb Ishan Chattopadhyaya > : > > I think this warrants a JIRA. To work around this issue for now, you can > use an environment variable SOLR_SECURITY_MANAGER_ENABLED=false before > starting Solr. > >> On Thu, Jul 16, 2

Re: CDCR stress-test issues

2020-07-17 Thread Jörn Franke
Instead of CDCR you may simply duplicate the pipeline across both data centers. Then there is no need at each step of the pipeline to replicate (storage to storage, index to index etc.). Instead both pipelines run in different data centers in parallel. > Am 24.06.2020 um 15:46 schrieb Oakley, Cr

Re: AtomicUpdate on SolrCloud is not working

2020-07-17 Thread Jörn Franke
What does „not work correctly mean“? Have you checked that all fields are stored or doc values? > Am 17.07.2020 um 11:26 schrieb yo tomi : > > Hi All > > Sorry, above settings are contrary with each other. > Actually, following setting does not work properly. > --- > > > > > > > --- > An

Re: Cannot read ZK Kerberos conf when enabling java security manager on 8.6

2020-07-16 Thread Jörn Franke
The solution would be probably a policy file shipped with Solr that allows the ZK jar to create a logincontext. I suggest that Solr ships it otherwise one would need to adapt it for every Solr update manually to include the version of the ZK jar. On Thu, Jul 16, 2020 at 8:15 PM Jörn Franke wrote

Re: Cannot read ZK Kerberos conf when enabling java security manager on 8.6

2020-07-16 Thread Jörn Franke
firm and I will create a JIRA issue for Solr On Thu, Jul 16, 2020 at 8:06 PM Jörn Franke wrote: > Hallo, > > I am using Solr 8.6.0. > When activating the Java security manager then Solr cannot use anymore the > jaas-client conf specified via java.security.auth.login.conf with > Z

Cannot read ZK Kerberos conf when enabling java security manager on 8.6

2020-07-16 Thread Jörn Franke
Hallo, I am using Solr 8.6.0. When activating the Java security manager then Solr cannot use anymore the jaas-client conf specified via java.security.auth.login.conf with Zookeeper. We have configured Kerberos authentication for Zookeeper. When disabling java security manager it works perfectly

Re: Supporting multiple indexes in one collection

2020-06-30 Thread Jörn Franke
t; >> On Tue, Jun 30, 2020 at 10:06 PM Jörn Franke wrote: >> >> What did you test? Which queries? What were the exact results in terms of >> time ? >> >>>> Am 30.06.2020 um 22:47 schrieb Raji N : >>> >>> Hi , >>> >>>

Re: Supporting multiple indexes in one collection

2020-06-30 Thread Jörn Franke
What did you test? Which queries? What were the exact results in terms of time ? > Am 30.06.2020 um 22:47 schrieb Raji N : > > Hi , > > > Trying to place multiple smaller indexes in one collection (as we read > solrcloud performance degrades as number of collections increase). We are > explori

Re: How to determine why solr stops running?

2020-06-29 Thread Jörn Franke
Maybe you can identify in the logfiles some critical queries? What is the total size of the index? What client are you using on the web app side? Are you reusing clients or create one new for every query. > Am 29.06.2020 um 21:14 schrieb Ryan W : > > On Mon, Jun 29, 2020 at 1:49 PM David Hast

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-24 Thread Jörn Franke
I agree with Bernd. I believe also that change is natural so eventually one needs to evolve the terminology or create a complete new product. To evolve the terminology one can write a page in the ref guide for translating it and over time adapt it in Solr etc. > Am 24.06.2020 um 13:30 schrieb

Re: solr fq with contains not returning any results

2020-06-24 Thread Jörn Franke
I don’t know your data, but could it be that you tokenize differently ? Why do you do the wildcard search at all? Maybe a different tokenizing strategy can bring you more effieciently results? Depends on what you need to achieve of course ... > Am 24.06.2020 um 05:37 schrieb yaswanth kumar : >

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-19 Thread Jörn Franke
Might be confusing with the nested doc terminology > Am 19.06.2020 um 20:14 schrieb Atita Arora : > > I see so many topics being discussed in this thread and I literary got lost > somewhere , but was just thinking can we call it Parent -Child > architecture, m sure no one will raise an objectio

Re: Proxy Error when cluster went down

2020-06-16 Thread Jörn Franke
Do you have another host with replica alive or are all replicas on the host that is down? Are all SolrCloud hosts in the same ZooKeeper? > Am 16.06.2020 um 19:29 schrieb Vishal Vaibhav : > > Hi thanks . My solr is running in kubernetes. So host name goes away with > the pod going search-rules-

Re: Solr cloud backup/restore not working

2020-06-16 Thread Jörn Franke
Have you looked in the Solr logfiles? > Am 16.06.2020 um 05:46 schrieb yaswanth kumar : > > Can anyone here help on the posted question pls?? > >> On Fri, Jun 12, 2020 at 10:38 AM yaswanth kumar >> wrote: >> >> Using solr 8.2.0 and setup a cloud with 2 nodes. (2 replica's for each >> collecti

Re: How to determine why solr stops running?

2020-06-15 Thread Jörn Franke
What is the Service definition of Solr in Redhat? > Am 15.06.2020 um 19:46 schrieb Ryan W : > > It happened again today. Again, no other apparent problems on the server. > Nothing else is stopping. Nothing in the logs that strikes me as useful. > I'm using Red Hat Linux 7.8 and Solr 7.7.2. >

Re: Script to check if solr is running

2020-06-08 Thread Jörn Franke
Use the solution described by Walter. This allows you to automatically restart in case of failure and is also cleaner than defining a cronjob. Otherwise This would be another dependency one needs to keep in mind - means if there is an issue and someone does not know the system the person has to

Re: Indexing PDF on SOLR 8.5

2020-06-07 Thread Jörn Franke
You have to write an external application that creates multiple threads, parses the PDFs and index them in Solr. Ideally you parse the PDFs once and store the resulting text on some file system and then index it. Reason is that if you upgrade to two major versions of Solr you might need to reind

Re: Solr takes time to warm up core with huge data

2020-06-05 Thread Jörn Franke
I think DIH is the wrong solution for this. If you do an external custom load you will be probably much faster. You have too much JVM memory from my point of view. Reduce it to eight or similar. It seems you are just exporting data so you are better off work the exporting handler. Add docvalue

Re: solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-04 Thread Jörn Franke
, but the issue is still happening when I try to push data to a >> non-leader node. >> >> Do you still think if its something to do with the configurations ?? >> >> Thanks, >> >>> On Wed, Jun 3, 2020 at 12:29 AM Jörn Franke wrote: >>> >>

Re: solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-04 Thread Jörn Franke
I am still seeing > the same issue. > > Thanks, > >> On Thu, Jun 4, 2020 at 12:23 PM Jörn Franke wrote: >> >> I think you should not do it in the Jetty xml >> Follow the official reference guide. >> It should be in solr.in.sh >&g

Re: solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-04 Thread Jörn Franke
when I try to push data to a >> non-leader node. >> >> Do you still think if its something to do with the configurations ?? >> >> Thanks, >> >>> On Wed, Jun 3, 2020 at 12:29 AM Jörn Franke wrote: >>> >>> Why in the jetty-ssl.xml? >&

Re: Insert documents to a particular shard

2020-06-02 Thread Jörn Franke
Hint: you can easily try out streaming expressions in the admin UI > Am 03.06.2020 um 07:32 schrieb Jörn Franke : > >  > You are trying to achieve data locality by having parents and children in the > same shard? > Does document routing address it? > > https://lucene.a

Re: Insert documents to a particular shard

2020-06-02 Thread Jörn Franke
You are trying to achieve data locality by having parents and children in the same shard? Does document routing address it? https://lucene.apache.org/solr/guide/8_5/shards-and-indexing-data-in-solrcloud.html#document-routing On a side node, I don’t know your complete use case, but have you expl

Re: solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-02 Thread Jörn Franke
he leader node > its working without any error, and now immediately if I hit non-leader its > working fine (only once or twice), but if I keep on trying to hit this node > again and again its then throwing the above error and once the error > started happening , its consistent again. >

Re: solr 8.4.1 with ssl tls1.2 creating an issue with non-leader node

2020-06-02 Thread Jörn Franke
Have you looked in the logfiles? Keystore Type correctly defined on all nodes? Have you configured the truststore on all nodes correctly? Have you set clusterprop urlScheme to htttps in ZK? https://lucene.apache.org/solr/guide/7_5/enabling-ssl.html#configure-zookeeper > Am 02.06.2020 um 18:

Re: SOLR cache tuning

2020-06-01 Thread Jörn Franke
You should not have other processes/container running on the same node. They potentially screw up your os cache making things slow, eg if the other processes also read files etc they can remove things from Solr from the Os cache and then the os cache needs to be filled again. What performance d

Re: Solr Admin UI with restricted authorization

2020-05-29 Thread Jörn Franke
You can restrict the admin UI by limiting access using the authorization plugin. I would though not give access to end users for the admin UI. A good practice is to create your own web application running on a dedicated server that manages all the authentication / authorization and provides a U

Re: highlighting a whole html document using Unified highlighter

2020-05-24 Thread Jörn Franke
give a whole html document as a parameter to the Unified > highlighter so that output is also a highlighted html document? > > Or > > Do you have a better idea to highlight the keywords of the whole html > document? > > Thanks, > > Serkan > > -Or

Re: highlighting a whole html document using Unified highlighter

2020-05-24 Thread Jörn Franke
hl.fragsize=0 https://lucene.apache.org/solr/guide/8_5/highlighting.html > Am 24.05.2020 um 11:49 schrieb Serkan KAZANCI : > > Hi, > > > > I use solr to search over a million html documents, when a document is > searched and displayed, I want to highlight the keywords that are used to > fi

Re: Query takes more time in Solr 8.5.1 compare to 6.1.0 version

2020-05-21 Thread Jörn Franke
Did you create Solrconfig.xml for the collection from scratch after upgrading and reindexing? Was it based on the latest template? If not then please try this. Maybe also you need to increase the corresponding caches in the config. What happens if you reexecute the query? Are there other proces

Re: Index using CSV file

2020-04-18 Thread Jörn Franke
Am 18.04.2020 um 17:43 schrieb Jörn Franke : > >  > This you don’t do via the Solr UI. You have many choices amongst others > 1) write a client yourself that parses the csv and post it to the standard > Update handler > https://lucene.apache.org/solr/guide/8_4/uploading-data-with-in

Re: Index using CSV file

2020-04-18 Thread Jörn Franke
This you don’t do via the Solr UI. You have many choices amongst others 1) write a client yourself that parses the csv and post it to the standard Update handler https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html 2) use the Solr post tool https://lucene.apache.org/

Re: Indexing data from multiple data sources

2020-04-17 Thread Jörn Franke
What does your Solr.log say? Any error ? > Am 17.04.2020 um 20:22 schrieb RaviKiran Moola > : > >  > Hi, > > Greetings!!! > > We are working on indexing data from multiple data sources (MySQL & MSSQL) in > a single collection. We specified data source details like connection details > along

Re: ZooKeeper 3.4 end of life

2020-04-15 Thread Jörn Franke
The problem with Solr related to use TLS with ZK is the following: * 3.5.5 seem to only support tls certificate authentication together with TLS . Solr support es only digest and Kerberos authentication. However, I have to check in the ZK jiras if this has changed with higher ZK versions * quorum

Re: solrQuery exception : org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://solrIP:8983/solr

2020-04-11 Thread Jörn Franke
Is it a self signed certificate ? Enterprise CAs? Then you need to add the certificates to tomcat, because Java needs to validate them (to avoid man-in-the-middle attacks). Curl may not validate the certificate?! > Am 11.04.2020 um 13:42 schrieb six23 <23sixconsult...@gmail.com>: > > Hi, > I

Re: entity in DIH for partial update?

2020-04-10 Thread Jörn Franke
You could use atomic updates in DIH. However, there is a bug in current/potentially also old Solr version that this leaks a searcher (which means the index data is infinitely growing until you restart the server). You can also export from the database to Jsonline, post it to the json update han

Re: Proper way to manage managed-schema file

2020-04-06 Thread Jörn Franke
You can use the Solr rest services to do all those operations. https://lucene.apache.org/solr/guide/8_3/schema-api.html Normally in a productive environment you don’t use the UI but do all changes in a controlled automated fashion using the REST APIs. > Am 06.04.2020 um 20:11 schrieb TK Solr

Re: Unable to delete zookeeper queue

2020-04-01 Thread Jörn Franke
Maybe you need I inc on zk server and zk client Jute Max bufffer to execute this . You can better ask the ZK mailing list > Am 01.04.2020 um 14:53 schrieb Kommu, Vinodh K. : > > Hi, > > Does anyone know a working solution to delete zookeeper queue data? Please > help!! > > > Regards, > Vin

Re: How do *you* restrict access to Solr?

2020-03-16 Thread Jörn Franke
Solr should not be accessible to end users directly - only through a dedicated application in between. Then in an enterprise setting it is mostly Kerberos auth. and https (do not forget about zookeeper when using Solr cloud here you can also have Kerberos auth and in recent version also SSL). I

Re: JSP support not configured in Solr 8.3.0

2020-03-11 Thread Jörn Franke
It is better to have a dedicated frontend for Solr on a dedicated server. For security reasons, Solr becomes more and more locked up and it is also discouraged to put own web applications on it. > Am 11.03.2020 um 10:03 schrieb vishal patel : > >  > I put the JSP in \server\solr-webapp\webapp

Re: Atomic Update and Optimization and segments

2020-03-10 Thread Jörn Franke
How do you do the atomic updates? I discovered a bug when doing them via DIH or Scriptupdateprocessor (only this one! The atomic one is fine) that leads to infinite index growth when doing atomic updates > Am 10.03.2020 um 13:28 schrieb Kayak28 : > > Hello, Community: > > Currently, my index

Re: multivalue faceting term optimization

2020-03-09 Thread Jörn Franke
hll stands for https://en.wikipedia.org/wiki/HyperLogLog You will not get the exact distinct count, but a distinct count very close to the real number. It is very fast and memory efficient for large number of distinct values. > Am 10.03.2020 um 00:25 schrieb Nicolas Paris : > >  > Erick Erick

Re: Problem with Solr 7.7.2 after OOM

2020-03-05 Thread Jörn Franke
Just keep in mind that the total memory should be much more than the heap to leverage Solr file caches. If you have 8 GB heap probably at least 16 gb total memory make sense to be available on the machine . > Am 05.03.2020 um 16:58 schrieb Walter Underwood : > >  >> >> On Mar 5, 2020, at 4:29

Re: Upgrading from 6.5.0 to 8.4.1

2020-02-27 Thread Jörn Franke
You did a reload and not a reindex? Probably the best is to delete the collection fully, create it new and index then . > Am 27.02.2020 um 14:02 schrieb Pavel Polivka : > > Hello, > > I am doing upgrade of SolrCloud cluster from 6.5.0 to 8.4.1. > > My process is: > > Upgrade to 7.7.2. > Rec

Re: Solr 8.2.0 - Schema issue

2020-02-26 Thread Jörn Franke
Not sure i understood the whole scenario. However did you try to reload (not reindex) the collection > Am 26.02.2020 um 15:02 schrieb Joe Obernberger : > > Hi All - I have several solr collections all with the same schema. If I add > a field to the schema and index it into the collection on

Re: Solr Upgrade socketTimeout issue in 8.2

2020-02-19 Thread Jörn Franke
Yes you need to reindex. Update solrconfig, schemas to leverage the later feature of the version (some datatypes are now more optimal others are deprecated. Update Solrconfig.xml and schema to leverage the latest datatypes , features etc.. Create new collection based on newest config. Use your

Re: Solr Relevancy problem

2020-02-19 Thread Jörn Franke
The best way to address this problem is to collect queries and examples why they are wrong and to document this. This is especially important when working with another vendor. Otherwise no one can give you proper help. > Am 19.02.2020 um 09:17 schrieb Pradeep Tambade > : > > Hello, > > We ha

Re: Best Practises around relevance tuning per query

2020-02-18 Thread Jörn Franke
You are too much focus on the solution. If you would describe the business case in more detail without including the solution itself more people could help. Eg it ie not clear why you have a scoring model and why this can address business needs. > Am 18.02.2020 um 01:50 schrieb Ashwin Ramesh :

Re: StatelessScriptUpdateProcessorFactory causing OOM errors?

2020-02-13 Thread Jörn Franke
I had also issues with this factory when creating atomic updates inside there. They worked, but searcher where never closed and new ones where open and stayed open with all the issues related to that one. Maybe one needs to look into more detail into that. However - it is a script in the end so

Re: Support Tesseract in Apache Solr

2020-02-11 Thread Jörn Franke
Honestly i would not run tesseract on the same server as Solr. It takes a lot of resources and may negatively impact Solr. Just write a small program using Tika+Tesseract that runs on a different server / container and posts the results to Solr. About your question: Probably Tika (a dependency

Re: solr-injection

2020-02-11 Thread Jörn Franke
Do not have users accessing Solr directly. Have your own secure web frontend/ own APIs for it. In this way you can control secure access. Secure Solr with https and Kerberos. Have for your web frontend only access rights needed and for your admins only the access rights they need. Automate de

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
:58 PM Jörn Franke wrote: > After testing the update?commit=true i now face an error: "Maximum lock > count exceeded". strange this is the first time i see this in the lockfiles > and when doing commit=true > ava.lang.Error: Maximum lock count exceede

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
81) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:917) at java.base/java.lang.Thread.run(Thread.java:834) On Tue, Jan 21, 2020 at 10:51 PM Jörn Franke wrote: > The only weird thing is I see that for instance I have > ${solr.autoCommit.maxTime:150

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
The only weird thing is I see that for instance I have ${solr.autoCommit.maxTime:15000} and similar entries. It looks like a template gone wrong, but this was not caused due to an internal development. It must have been come from a Solr version. On Tue, Jan 21, 2020 at 10:49 PM Jörn Franke

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
, 2020 at 4:04 PM Jörn Franke wrote: > thanks for the answer I will look into it - it is a possible explanation. > > > Am 20.01.2020 um 14:30 schrieb Erick Erickson : > > > > Jörn: > > > > The only thing I can think of that _might_ cause this (I’m not all that

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
he autocommit and cache > portions? > > Best, > Erick > >> On Jan 20, 2020, at 5:40 AM, Jörn Franke wrote: >> >> From what is see it basically duplicates the index files, but does not >> delete the old ones. >> It uses caffeine cache. >> >>

Re: Index growing and growing until restart

2020-01-20 Thread Jörn Franke
on CacheDir. > Am 20.01.2020 um 11:26 schrieb Jörn Franke : > > Sorry I missed a line - not tlog is growing but the /data/index folder is > growing - until restart when it seems to be purged. > >> Am 20.01.2020 um 10:47 schrieb Jörn Franke : >> >> Hi, >

Re: Index growing and growing until restart

2020-01-20 Thread Jörn Franke
Sorry I missed a line - not tlog is growing but the /data/index folder is growing - until restart when it seems to be purged. > Am 20.01.2020 um 10:47 schrieb Jörn Franke : > > Hi, > > I have a test system here with Solr 8.4 (but this is also reproducible in > older Solr ve

Index growing and growing until restart

2020-01-20 Thread Jörn Franke
Hi, I have a test system here with Solr 8.4 (but this is also reproducible in older Solr versions), which has an index which is growing and growing - until the SolrCloud instance is restarted - then it is reduced tot the expected normal size. The collection is configured to do auto commit afte

Re: Solr cloud production set up

2020-01-18 Thread Jörn Franke
I think you should do your own measurements. This is very document and processing specific. You can run a test with a simple setup for let’s say 1 mio document and interpolate from this. It could be also that your ETL is the bottleneck and not Solr. At the same time you can simulate user queries

Re: regarding Extracting text from Images

2020-01-17 Thread Jörn Franke
Have you checked this? https://cwiki.apache.org/confluence/display/TIKA/TikaOCR > Am 17.01.2020 um 10:54 schrieb Retro : > > Hello, can you please advise me, how to configure Solr so that embedded Tika > is able to use Tesseract to do the ocr of images? I have installed the > following softwar

Re: Solr 8.4.0 Cloud Graph is not shown due to CSP

2020-01-08 Thread Jörn Franke
I have to admit it was the cache. Sorry I believed I deleted it. Thanks for the efforts and testing ! I will update the Jira. > Am 07.01.2020 um 22:14 schrieb Jörn Franke : > >  > here you go: > https://issues.apache.org/jira/browse/SOLR-14176 > > a detailed screenshot

Re: Solr 8.4.0 Cloud Graph is not shown due to CSP

2020-01-07 Thread Jörn Franke
force refresh the UI to make sure > nothing is cached. Idk if that is in play here but doesn't hurt. > > [1] https://issues.apache.org/jira/browse/SOLR-13982 > [2] https://issues.apache.org/jira/browse/SOLR-13987 > > Kevin Risden > > On Tue, Jan 7, 2020, 11:15 Jörn Fran

Solr 8.4.0 Cloud Graph is not shown due to CSP

2020-01-07 Thread Jörn Franke
Dear all, I noted that in Solr Cloud 8.4.0 the graph is not shown due to Content-Security-Policy. Apparently it violates unsafe-eval. It is a minor UI thing, but should I create an issue to that one? Maybe it is rather easy to avoid in the source code of the admin page? Thank you. Best regards

Re: Question about the max num of solr node

2020-01-03 Thread Jörn Franke
Why do you want to set up so many? What are your designs in terms of volumes / no of documents etc? > Am 03.01.2020 um 10:32 schrieb Hongxu Ma : > > Hi community > I plan to set up a 128 host cluster: 2 solr nodes on each host. > But I have a little concern about whether solr can support so m

Re: Solr 7.5 seed up, accuracy details

2019-12-28 Thread Jörn Franke
This highly depends on how you designed your collections etc. - there is no general answer. You have to do a performance test based on your configuration and documents. I also recommend to check the Solr documentation on how to design a collection for 7.x and maybe start even from scratch defin

Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2019-12-24 Thread Jörn Franke
It seems that you got this handed over with little documentation. You have to explore what the import handler does. This is a custom configuration that you need to check how it works. Then as already said. You can simply install another version of Solr if you are within a Solr major version 8.x

Re: MoreLikeThis does not work

2019-12-22 Thread Jörn Franke
It looks like you are trying to do a more like this of all documents in the collection. I am not sure if this makes sense. Maybe you should put a query that results in less results, eg one that returns a specific document. > Am 22.12.2019 um 13:40 schrieb Nehemia Litterat : > > Hi, > Any help

Re: number of files indexed (re-formatted)

2019-12-18 Thread Jörn Franke
This depends on your ingestion process. Usually the unique ids that are not filenames may come not from a file or your ingestion process does not tel the file name. In this case the Collection seems to be configured to generate a unique identifier. Maybe you can describe more in detail on how y

Re: Atomic solrj update

2019-12-12 Thread Jörn Franke
One needs to see the code or get more insights on your design. Do you reuse the HTTPClient or do you create for every request a new one? How often do you commit? Do you do parallel updates from the client (multiple threads?). > Am 13.12.2019 um 06:56 schrieb Prem : > > I am trying to partially

Re: user solr created by install not working with default password

2019-12-11 Thread Jörn Franke
Even for in-house without no outside access you should have authentication and https. There can be a tiny misconfiguration somewhere else not controlled by you and you face suddenly a big open leak. Never do this - not even for development environments (here another important aspect is if there

Re: Is it possible to have different Stop words depending on the value of a field?

2019-12-02 Thread Jörn Franke
You can have different fields by country. I am not sure about your stop words but if they are not occurring in the other languages then you have not a problem. On the other hand: it you need more than stop words (eg lemmatizing, specialized way of tokenization etc) then you need a different fie

Re: problem using Http2SolrClient with solr 8.3.0

2019-11-27 Thread Jörn Franke
Which jdk version? In this Setting i would recommend JDK11. > Am 27.11.2019 um 22:00 schrieb Odysci : > > Hi, > I have a solr cloud setup using solr 8.3 and SolrJj, which works fine using > the HttpSolrClient as well as the CloudSolrClient. I use 2 solr nodes with > 3 Zookeeper nodes. > Recently

  1   2   3   >