Search Correlated Data between Multivalued Fields

2011-11-08 Thread David T. Webb
I have a normalized database schema that I have flattened out to create
a Solr schema.  My question is with regards to searching the multivalued
fields that are correlated from the sub-entity in the DataInputHandler.

 

Example

I have 2 tables CUSTOMER and NOTE

 

Customer can have one to many notes.

 

My data-config would look similar to this: (Not exact, just setting up
the question) J

 



  



 

My schema would be something like this:

 









 

 





 

All is well, indexed and searchable. 

 

So, if there are 100 notes per customer at varying dates, how would I
query to essentially ask:

 

Give me all the Customers where note_text has "sales" AND the note_date
is between Date1 and Date2?

 

The multi-valued data is stored as arrays and the array positions line
up property. (i.e.  note_id[x], note_date[x], and note_Text[x] represent
an actual row that was loaded from the database.

 

Any suggestions on how to accomplish my problem?

 

Thank you!

 

--

Sincerely,

David Webb

 



Re: Search Correlated Data between Multivalued Fields

2011-11-09 Thread David T. Webb
Can you point me to the docs on how to create the additional flat index of 
note?  Thx for the quick reply. Dave. 

Sent from my iPhone

On Nov 9, 2011, at 6:03 AM, "Andre Bois-Crettez"  wrote:

> I do not think this is possbile directly out of the box in Solr.
> 
> A quick workaround would be to fully denormalize the data, ie instead of 
> multivalued notes for a customer, have a completely flat index of 
> customer_note.
> Or maybe a custom request handler plugin could actually check that matches 
> are for note_id[x], note_date[x], and note_Text[x] ? Not sure if this is 
> doable.
> 
> Andre
> 
> David T. Webb wrote:
>> I have a normalized database schema that I have flattened out to create
>> a Solr schema.  My question is with regards to searching the multivalued
>> fields that are correlated from the sub-entity in the DataInputHandler.
>> 
>> 
>> Example
>> 
>> I have 2 tables CUSTOMER and NOTE
>> 
>> 
>> Customer can have one to many notes.
>> 
>> 
>> My data-config would look similar to this: (Not exact, just setting up
>> the question) J
>> 
>> 
>> 
>> 
>>  
>> 
>> 
>> 
>> 
>> My schema would be something like this:
>> 
>> 
>> > required="true" />
>> 
>> > required="false" />
>> 
>> > required="false" />
>> 
>> > required="false" />
>> 
>> 
>> > required="false" multiValued="true" /> 
>> > required="false" multiValued="true" />
>> 
>> > required="false" multiValued="true" />
>> 
>> 
>> All is well, indexed and searchable. 
>> 
>> So, if there are 100 notes per customer at varying dates, how would I
>> query to essentially ask:
>> 
>> 
>> Give me all the Customers where note_text has "sales" AND the note_date
>> is between Date1 and Date2?
>> 
>> 
>> The multi-valued data is stored as arrays and the array positions line
>> up property. (i.e.  note_id[x], note_date[x], and note_Text[x] represent
>> an actual row that was loaded from the database.
>> 
>> 
>> Any suggestions on how to accomplish my problem?
>> 
>> 
>> Thank you!
>> 
>> 
>> --
>> 
>> Sincerely,
>> 
>> David Webb
>> 
>> 
>> 
>>  
> 
> -- 
> André Bois-Crettez
> 
> Search technology, Kelkoo
> http://www.kelkoo.com/
> 


TikaEntityProcesor Exception Handling

2011-11-12 Thread David T. Webb
When indexing over 2MM documents with Solr and the TikaEntityProcessor,
the indexing fails if Tika encounters an exception with one of the
documents.  How can I tell Solr to keep going and just ignore the failed
documents from the Tika Processor?

 

Thanks.

 

--

Sincerely,

David Webb



RE: TikaEntityProcesor Exception Handling

2011-11-12 Thread David T. Webb
I found the answer with the onError="skip" on the Entity,  However,
after adding that parameter to the data-config.xml, the index processing
still stops when the TikaEntityProcessor throws an Exception.

Nov 12, 2011 10:22:16 AM org.apache.solr.common.SolrException log
SEVERE: Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to read content Processing Document # 562
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThr
ow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
yProcessor.java:130)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
ProcessorWrapper.java:238)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:596)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:622)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:622)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
:268)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
7)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
r.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
:427)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
408)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.ParserDecorator$1@8a799a
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
yProcessor.java:128)
... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 29
at
org.apache.poi.hwpf.model.StyleSheet.getCharacterStyle(StyleSheet.java:3
15)
at
org.apache.poi.hwpf.model.CHPX.getCharacterProperties(CHPX.java:60)
at
org.apache.poi.hwpf.usermodel.CharacterRun.(CharacterRun.java:98)
at
org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:797)
at
org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.jav
a:191)
at
org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.(Wor
dExtractor.java:429)
at
org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.(Wor
dExtractor.java:419)
at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:
75)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:18
7)
at
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 11 more

Nov 12, 2011 10:22:16 AM org.apache.solr.update.DirectUpdateHandler2
rollback
INFO: start rollback
Nov 12, 2011 10:22:16 AM org.apache.solr.update.DirectUpdateHandler2
rollback
INFO: end_rollback
--
Sincerely,
David Webb



-Original Message-----
From: David T. Webb [mailto:david.w...@brightmove.com] 
Sent: Saturday, November 12, 2011 10:08 AM
To: solr-user@lucene.apache.org
Subject: TikaEntityProcesor Exception Handling

When indexing over 2MM documents with Solr and the TikaEntityProcessor,
the indexing fails if Tika encounters an exception with one of the
documents.  How can I tell Solr to keep going and just ignore the failed
documents from the Tika Processor?

 

Thanks.

 

--

Sincerely,

David Webb



RE: TikaEntityProcesor Exception Handling

2011-11-12 Thread David T. Webb
Same result on onError="continue" .

Any help is appreciatedthank you.

--
Sincerely,
David Webb



-Original Message-----
From: David T. Webb [mailto:david.w...@brightmove.com] 
Sent: Saturday, November 12, 2011 10:27 AM
To: solr-user@lucene.apache.org
Subject: RE: TikaEntityProcesor Exception Handling

I found the answer with the onError="skip" on the Entity,  However,
after adding that parameter to the data-config.xml, the index processing
still stops when the TikaEntityProcessor throws an Exception.

Nov 12, 2011 10:22:16 AM org.apache.solr.common.SolrException log
SEVERE: Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to read content Processing Document # 562
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThr
ow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
yProcessor.java:130)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
ProcessorWrapper.java:238)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:596)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:622)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:622)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
:268)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
7)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
r.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
:427)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
408)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.ParserDecorator$1@8a799a
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
yProcessor.java:128)
... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 29
at
org.apache.poi.hwpf.model.StyleSheet.getCharacterStyle(StyleSheet.java:3
15)
at
org.apache.poi.hwpf.model.CHPX.getCharacterProperties(CHPX.java:60)
at
org.apache.poi.hwpf.usermodel.CharacterRun.(CharacterRun.java:98)
at
org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:797)
at
org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.jav
a:191)
at
org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.(Wor
dExtractor.java:429)
at
org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.(Wor
dExtractor.java:419)
at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:
75)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:18
7)
at
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 11 more

Nov 12, 2011 10:22:16 AM org.apache.solr.update.DirectUpdateHandler2
rollback
INFO: start rollback
Nov 12, 2011 10:22:16 AM org.apache.solr.update.DirectUpdateHandler2
rollback
INFO: end_rollback
--
Sincerely,
David Webb



-Original Message-
From: David T. Webb [mailto:david.w...@brightmove.com]
Sent: Saturday, November 12, 2011 10:08 AM
To: solr-user@lucene.apache.org
Subject: TikaEntityProcesor Exception Handling

When indexing over 2MM documents with Solr and the TikaEntityProcessor,
the indexing fails if Tika encounters an exception with one of the
documents.  How can I tell Solr to keep going and just ignore the failed
documents from the Tika Processor?

 

Thanks.

 

--

Sincerely,

David Webb



Index Update Strategy

2011-11-18 Thread David T. Webb
What are the general schools of thought on how to update an index?

 

I have a medium volume OLTP SaaS system.  I think my options are:

 

1)  Run the DIH delta-query every minutes to pull in changes

2)  Use "Update" events on the app to asynchronously create a bean
that represents my solr doc, then use SolrJ to add/update the doc.

 

Are there any other options.  Any advice from the veterans who have been
down this road before?

 

Thank you.

 

--

Sincerely,

David Webb

 



Delta Query Exception

2011-11-19 Thread David T. Webb
Im sure that my deltaQueries are causing this issue, but I have the
logging turned on the FINEST.  It would be great if this Exception was
handles properly and the failing PK test was also displayed.  I will
open a Jira for this request, but does anyone have any pointers on how
to determine which deltaQuery may be causing this to fail?

 

java.lang.NullPointerException

at
org.apache.solr.handler.dataimport.DocBuilder.findMatchingPkColumn(DocBu
ilder.java:839)

at
org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.ja
va:900)

at
org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.ja
va:879)

at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:28
5)

at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:17
9)

at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImport
er.java:390)

at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
:429)

at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
408)

 

--

Sincerely,

David Webb, President

BrightMove, Inc.

http://www.brightmove.com  

320 High Tide Dr, Suite 201

Saint Augustine Beach, FL 32080

(904) 861-2396

(866) 895-6299 (Fax)