Indexing multiple cores simultaneously

2015-10-25 Thread Peri Subrahmanya
Hi,

I wanted to check if the following would work;

1. Spawn n threads
2. Create n-cores 
3. Index n records simultaneously in n-cores
4. Merge all core indexes into a single master core

I have been able to successfully do this for 5 threads (5 cores) with 1000 
documents each. However, I wanted to check if there are any performance 
parameters that can be tweaked to make Solr handle 100 cores with 10K records 
each? It seems to be churning and I am not sure if it was ever going to finish. 

Any ideas on how things might be done differently ?

Thanks
-Peri

*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.




Sorting on multi-valued field

2015-02-24 Thread Peri Subrahmanya
All,

Is there a way sorting can work on a multi-valued field or does it always have 
to be “false” for it to work.

Thanks
-Peri

*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.




DataImport-Solr6: Nested Entities

2016-08-18 Thread Peri Subrahmanya
Hi,

I have a simple one-to-many relationship setup in the data-import.xml and when 
I try to index it using the dataImportHandler, Solr complains of “no unique id 
found”. 

managed-schema.xml
id
solrconfig,xml:

  
id
  
  
  


data-import.xml














Could someone please advise?

Thanks
-Peri

SolrInputDocument

2016-09-28 Thread Peri Subrahmanya
Hi All,

I have a simple case of indexing a SolrInputDocument with few 
ChildSolrInputDocumnets. For some reason, the child documents aren’t getting 
indexed. Is there any setting in the schema or config.xml that needs to be 
updated? 

Thanks
-Peri Subrahmanya

Re: SolrInputDocument

2016-09-28 Thread Peri Subrahmanya
Sorry Erick,

There wasn’t much to provide but here is the code snippet:

SolrInputDocument sid = new SolrInputDocument();
sid.addField(“someField”, “someValue”);

SolrInputDocument childSid = new SolrInputDocument();
childSid.add(“someField”, “someValue”);

sid.addChildDocument(childSid);

sendToSolr - this is just a solr http call with the input document. 

Result: I see the parent doc get indexed. But there is no child document when i 
look it up in the SolrAdmin interface. 

Does this help? 

Thanks
-Peri


> On Sep 28, 2016, at 1:39 PM, Erick Erickson  wrote:
> 
> There is close to zero information here to help diagnose your issue.
> You might review:
> 
> http://wiki.apache.org/solr/UsingMailingLists
> 
> Best,
> Erick
> 
> On Wed, Sep 28, 2016 at 10:32 AM, Peri Subrahmanya
>  wrote:
>> Hi All,
>> 
>> I have a simple case of indexing a SolrInputDocument with few 
>> ChildSolrInputDocumnets. For some reason, the child documents aren’t getting 
>> indexed. Is there any setting in the schema or config.xml that needs to be 
>> updated?
>> 
>> Thanks
>> -Peri Subrahmanya



Re: SolrInputDocument

2016-09-28 Thread Peri Subrahmanya
Thanks Alex, 

I have “id” as the unique key and so, when the parent and child are getting 
indexed, shouldn’t the child get an id automatically just like the parent? Is 
there a configuration setup that would allow for that.

Currently when I am adding the child document, i also have to set the “id” on 
the child otherwise it is complaining “missing required field: id”. When I do 
set it, indexing goes thru, but I am unable to find the child by its indexed 
field as per your suggestion. 


Thanks
-Peri


> On Sep 28, 2016, at 1:53 PM, Alexandre Rafalovitch  wrote:
> 
> The default document list returns in the flat form. Try looking up a child
> document by ID.
> 
> If that works, look for the child document transformer.
> 
> Regards,
>   Alex
> 
> On 29 Sep 2016 12:46 AM, "Peri Subrahmanya" 
> wrote:
> 
> Sorry Erick,
> 
> There wasn’t much to provide but here is the code snippet:
> 
> SolrInputDocument sid = new SolrInputDocument();
> sid.addField(“someField”, “someValue”);
> 
> SolrInputDocument childSid = new SolrInputDocument();
> childSid.add(“someField”, “someValue”);
> 
> sid.addChildDocument(childSid);
> 
> sendToSolr - this is just a solr http call with the input document.
> 
> Result: I see the parent doc get indexed. But there is no child document
> when i look it up in the SolrAdmin interface.
> 
> Does this help?
> 
> Thanks
> -Peri
> 
> 
>> On Sep 28, 2016, at 1:39 PM, Erick Erickson 
> wrote:
>> 
>> There is close to zero information here to help diagnose your issue.
>> You might review:
>> 
>> http://wiki.apache.org/solr/UsingMailingLists
>> 
>> Best,
>> Erick
>> 
>> On Wed, Sep 28, 2016 at 10:32 AM, Peri Subrahmanya
>>  wrote:
>>> Hi All,
>>> 
>>> I have a simple case of indexing a SolrInputDocument with few
> ChildSolrInputDocumnets. For some reason, the child documents aren’t
> getting indexed. Is there any setting in the schema or config.xml that
> needs to be updated?
>>> 
>>> Thanks
>>> -Peri Subrahmanya



Parallel Indexing

2014-12-22 Thread Peri Subrahmanya
Hi,

We have millions of records in our db that we do a complete re-index of every 
fortnight or so. It takes around 11 hours or so and I was wondering if there 
was a way to fetch the records in batches parallel and issue the solr http 
command with the solr docs in parallel. Please let me know.

Thanks
-Peri.S
http://www.kuali.org/ole 



*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.


Re: Parallel Indexing

2014-12-22 Thread Peri Subrahmanya
Thanks guys for the quick responses. I need to take the suggestions, 
incorporate them, figure out how is that we are doing the fetching etc and 
reply back on this post. The suggestions have been very helpful in taking this 
forward for us here. 

Thanks
-Peri.S

> On Dec 22, 2014, at 10:32 AM, Erick Erickson  wrote:
> 
> Just to pile on
> 
> _very_ frequently in my experience the problem
> is not Solr at all, but acquiring the data in the
> first place, i.e. often executing the DB query.
> 
> A very simple test is (in the SolrJ world) just comment
> out the server.add(doclist).
> 
> Assuming you're using SolrJ, you _are_ indexing in
> batches, right? And you are _not_ committing from
> the  program, right? And As Hossman often says,
> details matter.
> 
> Also, take a look at your Solr server CPU utilization. You
> can get a crude idea of how much work it's doing,
> unless you have it running at 100% your bottleneck is
> on the acquisition side.
> 
> For a benchmark (admittedly not directly comparable),
> I can index 11M Wikipedia docs on my laptop in < 1
> hour without tuning anything. They're in XML format
> so data acquisition is very fast...
> 
> Best,
> Erick
> 
> On Mon, Dec 22, 2014 at 7:21 AM, Mikhail Khludnev
> mailto:mkhlud...@griddynamics.com>> wrote:
>> What your indexer is build on? Do you use SolrJ, just REST, or
>> DataImportHandler? What's you DB schema is briefly?
>> Frankly speaking, there are few approaches to handle indexing concurrently,
>> details depends on the details mentioned above.
>> 
>> On Mon, Dec 22, 2014 at 5:54 PM, Peri Subrahmanya <
>> peri.subrahma...@htcinc.com> wrote:
>>> 
>>> Hi,
>>> 
>>> We have millions of records in our db that we do a complete re-index of
>>> every fortnight or so. It takes around 11 hours or so and I was wondering
>>> if there was a way to fetch the records in batches parallel and issue the
>>> solr http command with the solr docs in parallel. Please let me know.
>>> 
>>> Thanks
>>> -Peri.S
>>> http://www.kuali.org/ole <http://www.kuali.org/ole>
>>> 
>>> 
>>> 
>>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended
>>> recipient, please delete without copying and kindly advise us by e-mail of
>>> the mistake in delivery.
>>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC
>>> Global Services to any order or other contract unless pursuant to explicit
>>> written agreement or government initiative expressly permitting the use of
>>> e-mail for such purpose.
>>> 
>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>> 
>> <http://www.griddynamics.com <http://www.griddynamics.com/>>
>> mailto:mkhlud...@griddynamics.com>>
> 
> --- 
> This message has been scanned for viruses and dangerous content by HTC E-Mail 
> Virus Protection Service. 



*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.


SOLR Install

2013-04-24 Thread Peri Subrahmanya
I m trying to use solr as part of another maven based web application. I m
not sure how to wire the two war files. Any help please? I found this
documentation in SOLR but unsure how to go about it.
 




Thank you,
Peri Subrahmanya




On 4/24/13 12:52 PM, "Michael Della Bitta"
 wrote:

>"solrservice.php" and the text of that error both sound like parts of
>Typo3... they're definitely not part of Solr. You should ask on a list
>devoted to Typo3 to figure out what to do in this situation. It likely
>won't involve reconfiguring Solr.
>
>Michael Della Bitta
>
>
>Appinions
>18 East 41st Street, 2nd Floor
>New York, NY 10017-6271
>
>www.appinions.com
>
>Where Influence Isn¹t a Game
>
>
>On Wed, Apr 24, 2013 at 11:53 AM, vishal gupta 
>wrote:
>> Hi i am using Solr 4.2.0 and extension 2.8.2  with Typo3. Whever I try
>>to do
>> indexing pages and news pages It gets only 3.29% indexed. I checked a
>> developer log and found error in solrservice.php. And in solr admin it
>>is
>> giving "Dups is not defined please add it". What should i do in this
>>case?
>> If possible please send me the settings of schema.xml and
>>solrconfig.xml .i
>> am new to typo3 and solr both.
>>
>>
>>
>> --
>> View this message in context:
>>http://lucene.472066.n3.nabble.com/Solr-indeing-Partially-working-tp40586
>>23.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>




*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.




DataImportHandler - Indexing xml content

2013-04-26 Thread Peri Subrahmanya
I have a column in my database that is of type long text and holds xml
content. I was wondering when I define the entity record is there a way to
provide a custom extractor that will take in the xml and return rows with
appropriate fields to be indexed.

Thank you,
Peri Subrahmanya





On 4/26/13 12:24 PM, "Shawn Heisey"  wrote:

>On 4/25/2013 9:00 AM, xiaoqi wrote:
>> i using DIH to build index is slow , when it fetch 2 million rows , it
>>will
>> spend 20 minutes , very slow.
>
>If it takes 20 minutes for two million records, I'd say it's working
>very well.  I do six simultaneous MySQL imports of 13 million records
>each.  It takes a little over 3 hours on Solr 3.5.0, a little over four
>hours on Solr 4.2.1 (due to compression and the transaction log).  If I
>do them one at a time instead of all at once, it will go *slightly*
>faster for each one, but the overall process would take a whole day.
>For comparison purposes, that's about 20 minutes each time it does 1
>million rows.  Yours is going twice as fast as mine.
>
>Looking at your config file, I don't see a batchSize parameter.  This is
>a change that is specific to MySQL.  You can greatly reduce the memory
>usage by including this attribute in the dataSource tag along with the
>user and password:
>
>batchSize="-1"
>
>With two million records and no batchSize parameter, I'm surprised you
>aren't hitting an Out Of Memory error.  By default JDBC will pull down
>all the results and store them in memory, then DIH will begin indexing.
> A batchSize of -1 makes DIH tell the MySQL JDBC driver to stream the
>results instead of storing them.  Reducing the memory usage in this way
>might make it go faster.
>
>Thanks,
>Shawn
>
>
>
>
>




*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.




Indexing DB

2013-04-27 Thread Peri Subrahmanya
I hooked up the dataimorthandler to my db which has around 15M records.
When I run the full-imoport, its erroring out with java heap space
message. Is there something I need to configure?

Thank you,
Peri Subrahmanya



On 4/27/13 3:53 AM, "Mohsen Saboorian"  wrote:

>I have a Solr 4.2 server setup on a CentOS 6.4 x86 and Java:
>OpenJDK Runtime Environment (rhel-2.3.8.0.el6_4-i386)
>OpenJDK Server VM (build 23.7-b01, mixed mode)
>
>Currently it has a 4.6GB index with ~400k records. When I search for
>certain keywords, Solr fails with the following message.
>Any idea why this happens? Can I repair indices?
>
>1. I didn't specify any SolrDeletionPolicy. It's commented in
>solrconfig.xml as default.
>2.  is now LUCENE_42 (but it was LUCNE_41 before I
>upgrade to solr 4.2)
>
>Thanks,
>Mohsen
>
>SEVERE: null:java.io.IOException: Input/output error:
>NIOFSIndexInput(path="/app/solr/tomcat/solr/core1/data/index/_2xmx.tvd")
>at
>org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSD
>irectory.java:191)
>at
>org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:
>272)
>at
>org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.jav
>a:51)
>at
>org.apache.lucene.util.packed.BlockPackedReaderIterator.skip(BlockPackedRe
>aderIterator.java:127)
>at
>org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.readPosi
>tions(CompressingTermVectorsReader.java:586)
>at
>org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(Comp
>ressingTermVectorsReader.java:381)
>at
>org.apache.lucene.index.SegmentReader.getTermVectors(SegmentReader.java:17
>5)
>at
>org.apache.lucene.index.BaseCompositeReader.getTermVectors(BaseCompositeRe
>ader.java:97)
>at
>org.apache.lucene.search.highlight.TokenSources.getTokenStreamWithOffsets(
>TokenSources.java:280)
>at
>org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlight
>er(DefaultSolrHighlighter.java:453)
>at
>org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSol
>rHighlighter.java:391)
>at
>org.apache.solr.handler.component.HighlightComponent.process(HighlightComp
>onent.java:139)
>at
>org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa
>ndler.java:208)
>at
>org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas
>e.java:135)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
>at
>org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java
>:639)
>at
>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
>a:345)
>at
>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
>a:141)
>at
>org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicati
>onFilterChain.java:243)
>at
>org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilter
>Chain.java:210)
>at
>org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.
>java:222)
>at
>org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.
>java:123)
>at
>org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:1
>71)
>at
>org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:9
>9)
>at
>org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
>at
>org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.ja
>va:118)
>at
>org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408
>)
>at
>org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Pro
>cessor.java:1009)
>at
>org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abstr
>actProtocol.java:589)
>at
>org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.jav
>a:310)
>at
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1145)
>at
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:615)
>at java.lang.Thread.run(Thread.java:722)
>Caused by: java.io.IOException: Input/output error
>at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
>at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:51)
>at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:222)
>at sun.nio.ch.IOUtil.read(IOUtil.java:198)
>at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.

Re: Indexing DB

2013-04-27 Thread Peri Subrahmanya
I fixed it by setting the batchSize="-1" in the db-config.xml. Apparently
the default fetch size of 500 rows is not honored by some of the db
drivers. I was using MySQL server.

Thank you,
Peri Subrahmanya




On 4/27/13 1:20 PM, "Peri Subrahmanya"  wrote:

>I hooked up the dataimorthandler to my db which has around 15M records.
>When I run the full-imoport, its erroring out with java heap space
>message. Is there something I need to configure?
>
>Thank you,
>Peri Subrahmanya
>




*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.




EmbeddedSolrServer

2013-05-01 Thread Peri Subrahmanya
I m trying to use the EmbeddedSolrServer and here is my sample code:

CoreContainer.Initializer initializer = new CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, "");

Upon running I get the following exception - java.lang.NoClassDefFoundError:
org/apache/solr/common/cloud/ZooKeeperException.

I m not sure why its complaining about ZooKeeper. Any ideas please?

Thank you,
Peri Subrahmanya




*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.


Re: EmbeddedSolrServer

2013-05-02 Thread Peri Subrahmanya
I actually have a maven project with a declared solrj dependency (4.2.1); Do I 
need anything extra to get rid of the Zookeeper exception? I didn't see jars 
specific to zookeeper in the list below that I would need. Any more ideas 
please?

Thank you,
Peri Subrahmanya


On May 2, 2013, at 4:48 PM, Shawn Heisey  wrote:

> On 5/2/2013 1:43 PM, Alexandre Rafalovitch wrote:
>> Actually, I found it very hard to figure out the exact Jar
>> requirements for SolrJ. I ended up basically pointing at expanded
>> webapp's lib directory, which is a total overkill.
>> 
>> Would be nice to have some specific guidance on this issue.
> 
> I have a SolrJ app that uses HttpSolrServer.  Here is the list of jars
> in my lib directory relevant to SolrJ.  There are other jars related to
> the other functionality in my app that I didn't list here.  I take a
> very minimalistic approach for what I add to my lib directory.  I work
> out the minimum jars required to get it to compile, then I try the
> program out and determine which additional jars it needs one by one.
> 
> commons-io-2.4.jar
> httpclient-4.2.4.jar
> httpcore-4.2.4.jar
> httpmime-4.2.4.jar
> jcl-over-slf4j-1.7.5.jar
> log4j-1.2.17.jar
> slf4j-api-1.7.5.jar
> slf4j-log4j12-1.7.5.jar
> solr-solrj-4.2.1.jar
> 
> You might notice that my component versions are newer than what is
> included in dist/solrj-lib.  I have tested all of the functionality of
> my application, and I do not require the other jars found in
> dist/solrj-lib, including zookeeper.  When I add functionality in the
> future, if I run into a class not found exception, I will add the
> appropriate jar.
> 
> If I were using CloudSolrServer, zookeeper would be required.  With
> EmbeddedSolrServer, more Lucene and Solr jars are required, because that
> starts the Solr server itself within your application.
> 
> Thanks,
> Shawn
> 
> 
> 
> 
> 



*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.