Can we change structure of "add" command?

2007-06-13 Thread Vikas
Hi Friends,

I am new to Solr technology. I want to do searching for xml documents. In 
source distribution examples of solr, we have same structure of XML documents 
i.e.




...



Can we change this structure of posting xml document?
i.e Can we do it like -->
















Actually I have student.xml in above format. I want to do indexing and 
searching on student database which is in XML files.

With best regards

From
Vikas R. Khengare


Solr resource usage/Clustering

2015-02-25 Thread Vikas Agarwal
Hi,

We have a single solr instance serving queries to the client through out
the day and being indexed twice a day using scheduled jobs. During the
scheduled jobs, which actually syncs databases from data collection
machines to the master database, it can make many indexing calls. It is
usually about 50k-100k records that are synced on each iteration of sync
and we make calls to solr in batch of 1000 documents.

Now, during the sync process, solr throws 503 (service not available
message) quite frequently and in fact it responds very slow to index the
documents. I have checked the cpu and memory usage during the sync process
and it never consumed more than 40-50 % of CPU and 10-20% of RAM.

My question is how to increase the performance of indexing to increase the
speed up the sync process.

-- 
Regards,
Vikas Agarwal
91 – 9928301411

InfoObjects, Inc.
Execution Matters
http://www.infoobjects.com
2041 Mission College Boulevard, #280
Santa Clara, CA 95054
+1 (408) 988-2000 Work
+1 (408) 716-2726 Fax


Problem with live Solr cloud (6.6) backup using collection API

2017-09-25 Thread Vikas Mehra
Cluster has 1 zookeeper node and 3 solr nodes. There is only one collection
with 3 shards. Data is continuously indexed using SolrJ API. System is
running on AWS and I am taking backup on EFS (Elastic File System).

Observed behavior:
If indexing is not in progress, I take a backup of cluster using collection
API, backup succeeds and restore works as expected.

snapshotscli.sh works as expected if I first take snapshot of index while
indexing is in progress and then take backup. There is no error during
restore.

However, I get error most of the time if I try to restore collection from
the backup taken using collection API when indexing was still in progress.
Error is always missing segment and I can see that segment its trying to
read during restore does not exist in the backup shard directory.

Also, Is there a way to take snapshot of solr cloud using collection api?
User guide only has documentation to take snapshot of core using collection
api.

2017-09-08 19:47:22.592 WARN
(parallelCoreAdminExecutor-5-thread-8-processing-n:ec2-34-201-149-27.compute-1.amazonaws.com:8983_solr
t1cloudbackuponefs-r2187461299681393 RESTORECORE) [   ] o.a.s.h.RestoreCore
Could not switch to restored index. Rolling back to the current index
org.apache.lucene.index.CorruptIndexException: Unexpected file read error
while reading index.
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="/var/solr/data/t1cloud3_shard2_replica0/data/restore.20170908194722131/segments_y")))
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:930)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:118)
at
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:93)
at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:248)
at
org.apache.solr.update.DefaultSolrCoreState.changeWriter(DefaultSolrCoreState.java:211)
at
org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:220)
at
org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java:726)
at org.apache.solr.handler.RestoreCore.doRestore(RestoreCore.java:108)
at
org.apache.solr.handler.admin.RestoreCoreOp.execute(RestoreCoreOp.java:65)
at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:384)
at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388)
at
org.apache.solr.handler.admin.CoreAdminHandler.lambda$handleRequestBody$0(CoreAdminHandler.java:182)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.NoSuchFileException:
/var/solr/data/t1cloud3_shard2_replica0/data/restore.20170908194722131/_
4m.si
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
at
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:192)
at
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
at
org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
... 17 more


single core for extracted text from pdf/other doc types and metadata fields about that doc from the database

2013-10-23 Thread Sharma, Vikas

Can I create a core where one subset of fields comes from the Database source 
using the DataImport handler for database
and another subset of fields using the Apache Tika dataimport handler

For example if in the indexed doc I want following fields to come from the 
database source

1  Id
2  DocFilePath (nullable)
3  Subject
4  KeyWords
5  Description
6  Text

and another set of field(s) to come from documents on the  filesystem with text 
extracted using Apache Tika processor

7  DocText


so that Final Doc fields are as follows
where DocText is the text of the document whose path is mentioned in the 
DocFilePath column

1  Id
2  DocFilePath (nullable)
3  Subject
4  KeyWords
5  Description
6  Text
7  DocText


Thanks,
Vikas

Vikas Sharma | Senior Software Engineer | MedAssets
14405 SE 36th Street, Suite 206 | Bellevue, WA, 98006 | Work: 425.519.1305
vsha...@medassets.com<mailto:vsha...@medassets.com>
Visit us at www.medassets.com<http://www.medassets.com>
Follow us on LinkedIn<http://www.linkedin.com/company/medassets>, 
YouTube<https://www.youtube.com/user/MedAssetsInc>, 
Twitter<https://twitter.com/MedAssets>, and 
Facebook<https://www.facebook.com/MedAssets>

*Attention*
This electronic transmission may contain confidential, sensitive, proprietary 
and/or privileged information belonging to the sender. This information, 
including any attached files, is intended only for the persons or entities to 
which it is addressed. Authorized recipients of this information are prohibited 
from disclosing the information to any unauthorized party and are required to 
properly dispose of the information upon fulfillment of its need/use, unless 
otherwise required by law. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by any person 
or entity other than the intended recipient is prohibited. If you have received 
this electronic transmission in error, please notify the sender and properly 
dispose of the information immediately.


Re: Stopwords in shingles suggester

2014-10-27 Thread Vikas Agarwal
Is this
<https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory>
what you are looking for? Basically, you can use analyzers for this
purpose. You can even write your own analyzer.

On Mon, Oct 27, 2014 at 6:26 PM, O. Klein  wrote:

> Is there a way in Solr to filter out stopwords in shingles like ES does?
>
> http://www.elasticsearch.org/blog/searching-with-shingles/
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Stopwords-in-shingles-suggester-tp4166057.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Vikas Agarwal
91 – 9928301411

InfoObjects, Inc.
Execution Matters
http://www.infoobjects.com
2041 Mission College Boulevard, #280
Santa Clara, CA 95054
+1 (408) 988-2000 Work
+1 (408) 716-2726 Fax


Re: High system cpu usage while starting solr

2014-11-07 Thread Vikas Agarwal
One quick improvement can be to add add -Xm*s*6144m along with -Xmx6144m
this causes jvm to acquire all memory before hand and it would not waste
time in allocating more memory by requesting to kernel.

On restart, I am not sure but I guess solr does some syncing of indexes, so
it might be slow to respond in that duration.

On Fri, Nov 7, 2014 at 2:58 PM, mizayah  wrote:

> Hello,
>
> Im running few solr cores on one pretty good server. After some time i
> discover that restarting solr makes   queries last longer.
>
> What i see is that after restart jvm usage is realy low and raise slowly
> while system cpu ussage is high.
> My select queries are realy slow during that time.
> After few days when jvm grab some MORE memory system drops down.
>
>
> java settings
> -Xmx6144m -XX:+UseConcMarkSweepGC -XX:+PrintGC -XX:+PrintGCDetails
>
> I have 8GB ram
>
> HELP!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/High-system-cpu-usage-while-starting-solr-tp4168124.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Vikas Agarwal
91 – 9928301411

InfoObjects, Inc.
Execution Matters
http://www.infoobjects.com
2041 Mission College Boulevard, #280
Santa Clara, CA 95054
+1 (408) 988-2000 Work
+1 (408) 716-2726 Fax


Weird issues when using synonyms and stopwords together

2020-03-20 Thread Vikas Kumar
I have a field title in my solr schema:



text_en is defined as follows:


















I'm encountering strange behaviour when using multi-word synonyms which
contain stopwords.

If the stopwords appear in the middle, it works fine. For example, if I
have the following in my synonyms file (where i is a stopword):

iphone, apple i phone

And if I query: /select?q=iphone&qf=title&defType=edismax

The parsed query is: +DisjunctionMaxQuery(+title:appl +title:phone)
title:iphon

Same for query: /select?q=apple i phone&qf=title&defType=edismax

But if stopwords appear at the start or end, then behaviour is
unpredictable.

In most of the cases, the entire synonym is dropped. For example, if I
change my synonyms file to:

iphone, i phone

and do the same query again (with iphone), I get:

+DisjunctionMaxQuery(((title:iphon)))

I was expecting iphon and phone (as i would be dropped) in my dismax query.

In some cases, behaviour is even more weird.

For example, if my synonyms file is:

between two ferns,netflix comedy,zach galifianakis show,netflix 2019 best

and I have ferns and best as my stopwords. If I do the following query:

/select?q=netflix comedy&qf=title&defType=edismax

I get this:

+DisjunctionMaxQuery+title:between +title:two +title:galifianaki
+title:show) (+title:netflix +title:2019 +title:comedi

which is kind of a very weird combinations.

I'm not able to understand this behaviour and have not found anything
related to this in documentation or internet. Maybe I'm missing something.
Any help/pointers is highly appreciated.

Solr version: 8.4.1


Minimum Tomcat version that supports latest Solr version

2019-10-15 Thread vikas shinde
Dear Solr team,

Which is the latest Tomcat version that supports the latest Solr version
8.2.0?

Also provide details about previous Solr versions & their compatible Tomcat
versions.


Thanks & Regards.
Vikas Shinde.