Can we change structure of "add" command?
Hi Friends, I am new to Solr technology. I want to do searching for xml documents. In source distribution examples of solr, we have same structure of XML documents i.e. ... Can we change this structure of posting xml document? i.e Can we do it like --> Actually I have student.xml in above format. I want to do indexing and searching on student database which is in XML files. With best regards From Vikas R. Khengare
Solr resource usage/Clustering
Hi, We have a single solr instance serving queries to the client through out the day and being indexed twice a day using scheduled jobs. During the scheduled jobs, which actually syncs databases from data collection machines to the master database, it can make many indexing calls. It is usually about 50k-100k records that are synced on each iteration of sync and we make calls to solr in batch of 1000 documents. Now, during the sync process, solr throws 503 (service not available message) quite frequently and in fact it responds very slow to index the documents. I have checked the cpu and memory usage during the sync process and it never consumed more than 40-50 % of CPU and 10-20% of RAM. My question is how to increase the performance of indexing to increase the speed up the sync process. -- Regards, Vikas Agarwal 91 – 9928301411 InfoObjects, Inc. Execution Matters http://www.infoobjects.com 2041 Mission College Boulevard, #280 Santa Clara, CA 95054 +1 (408) 988-2000 Work +1 (408) 716-2726 Fax
Problem with live Solr cloud (6.6) backup using collection API
Cluster has 1 zookeeper node and 3 solr nodes. There is only one collection with 3 shards. Data is continuously indexed using SolrJ API. System is running on AWS and I am taking backup on EFS (Elastic File System). Observed behavior: If indexing is not in progress, I take a backup of cluster using collection API, backup succeeds and restore works as expected. snapshotscli.sh works as expected if I first take snapshot of index while indexing is in progress and then take backup. There is no error during restore. However, I get error most of the time if I try to restore collection from the backup taken using collection API when indexing was still in progress. Error is always missing segment and I can see that segment its trying to read during restore does not exist in the backup shard directory. Also, Is there a way to take snapshot of solr cloud using collection api? User guide only has documentation to take snapshot of core using collection api. 2017-09-08 19:47:22.592 WARN (parallelCoreAdminExecutor-5-thread-8-processing-n:ec2-34-201-149-27.compute-1.amazonaws.com:8983_solr t1cloudbackuponefs-r2187461299681393 RESTORECORE) [ ] o.a.s.h.RestoreCore Could not switch to restored index. Rolling back to the current index org.apache.lucene.index.CorruptIndexException: Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/var/solr/data/t1cloud3_shard2_replica0/data/restore.20170908194722131/segments_y"))) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:930) at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:118) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:93) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:248) at org.apache.solr.update.DefaultSolrCoreState.changeWriter(DefaultSolrCoreState.java:211) at org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:220) at org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java:726) at org.apache.solr.handler.RestoreCore.doRestore(RestoreCore.java:108) at org.apache.solr.handler.admin.RestoreCoreOp.execute(RestoreCoreOp.java:65) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:384) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388) at org.apache.solr.handler.admin.CoreAdminHandler.lambda$handleRequestBody$0(CoreAdminHandler.java:182) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.file.NoSuchFileException: /var/solr/data/t1cloud3_shard2_replica0/data/restore.20170908194722131/_ 4m.si at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:335) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:192) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137) at org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288) ... 17 more
single core for extracted text from pdf/other doc types and metadata fields about that doc from the database
Can I create a core where one subset of fields comes from the Database source using the DataImport handler for database and another subset of fields using the Apache Tika dataimport handler For example if in the indexed doc I want following fields to come from the database source 1 Id 2 DocFilePath (nullable) 3 Subject 4 KeyWords 5 Description 6 Text and another set of field(s) to come from documents on the filesystem with text extracted using Apache Tika processor 7 DocText so that Final Doc fields are as follows where DocText is the text of the document whose path is mentioned in the DocFilePath column 1 Id 2 DocFilePath (nullable) 3 Subject 4 KeyWords 5 Description 6 Text 7 DocText Thanks, Vikas Vikas Sharma | Senior Software Engineer | MedAssets 14405 SE 36th Street, Suite 206 | Bellevue, WA, 98006 | Work: 425.519.1305 vsha...@medassets.com<mailto:vsha...@medassets.com> Visit us at www.medassets.com<http://www.medassets.com> Follow us on LinkedIn<http://www.linkedin.com/company/medassets>, YouTube<https://www.youtube.com/user/MedAssetsInc>, Twitter<https://twitter.com/MedAssets>, and Facebook<https://www.facebook.com/MedAssets> *Attention* This electronic transmission may contain confidential, sensitive, proprietary and/or privileged information belonging to the sender. This information, including any attached files, is intended only for the persons or entities to which it is addressed. Authorized recipients of this information are prohibited from disclosing the information to any unauthorized party and are required to properly dispose of the information upon fulfillment of its need/use, unless otherwise required by law. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by any person or entity other than the intended recipient is prohibited. If you have received this electronic transmission in error, please notify the sender and properly dispose of the information immediately.
Re: Stopwords in shingles suggester
Is this <https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory> what you are looking for? Basically, you can use analyzers for this purpose. You can even write your own analyzer. On Mon, Oct 27, 2014 at 6:26 PM, O. Klein wrote: > Is there a way in Solr to filter out stopwords in shingles like ES does? > > http://www.elasticsearch.org/blog/searching-with-shingles/ > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Stopwords-in-shingles-suggester-tp4166057.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Regards, Vikas Agarwal 91 – 9928301411 InfoObjects, Inc. Execution Matters http://www.infoobjects.com 2041 Mission College Boulevard, #280 Santa Clara, CA 95054 +1 (408) 988-2000 Work +1 (408) 716-2726 Fax
Re: High system cpu usage while starting solr
One quick improvement can be to add add -Xm*s*6144m along with -Xmx6144m this causes jvm to acquire all memory before hand and it would not waste time in allocating more memory by requesting to kernel. On restart, I am not sure but I guess solr does some syncing of indexes, so it might be slow to respond in that duration. On Fri, Nov 7, 2014 at 2:58 PM, mizayah wrote: > Hello, > > Im running few solr cores on one pretty good server. After some time i > discover that restarting solr makes queries last longer. > > What i see is that after restart jvm usage is realy low and raise slowly > while system cpu ussage is high. > My select queries are realy slow during that time. > After few days when jvm grab some MORE memory system drops down. > > > java settings > -Xmx6144m -XX:+UseConcMarkSweepGC -XX:+PrintGC -XX:+PrintGCDetails > > I have 8GB ram > > HELP! > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/High-system-cpu-usage-while-starting-solr-tp4168124.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Regards, Vikas Agarwal 91 – 9928301411 InfoObjects, Inc. Execution Matters http://www.infoobjects.com 2041 Mission College Boulevard, #280 Santa Clara, CA 95054 +1 (408) 988-2000 Work +1 (408) 716-2726 Fax
Weird issues when using synonyms and stopwords together
I have a field title in my solr schema: text_en is defined as follows: I'm encountering strange behaviour when using multi-word synonyms which contain stopwords. If the stopwords appear in the middle, it works fine. For example, if I have the following in my synonyms file (where i is a stopword): iphone, apple i phone And if I query: /select?q=iphone&qf=title&defType=edismax The parsed query is: +DisjunctionMaxQuery(+title:appl +title:phone) title:iphon Same for query: /select?q=apple i phone&qf=title&defType=edismax But if stopwords appear at the start or end, then behaviour is unpredictable. In most of the cases, the entire synonym is dropped. For example, if I change my synonyms file to: iphone, i phone and do the same query again (with iphone), I get: +DisjunctionMaxQuery(((title:iphon))) I was expecting iphon and phone (as i would be dropped) in my dismax query. In some cases, behaviour is even more weird. For example, if my synonyms file is: between two ferns,netflix comedy,zach galifianakis show,netflix 2019 best and I have ferns and best as my stopwords. If I do the following query: /select?q=netflix comedy&qf=title&defType=edismax I get this: +DisjunctionMaxQuery+title:between +title:two +title:galifianaki +title:show) (+title:netflix +title:2019 +title:comedi which is kind of a very weird combinations. I'm not able to understand this behaviour and have not found anything related to this in documentation or internet. Maybe I'm missing something. Any help/pointers is highly appreciated. Solr version: 8.4.1
Minimum Tomcat version that supports latest Solr version
Dear Solr team, Which is the latest Tomcat version that supports the latest Solr version 8.2.0? Also provide details about previous Solr versions & their compatible Tomcat versions. Thanks & Regards. Vikas Shinde.