Re: Schema Design/Data Import

2011-07-25 Thread Stefan Matheis
Am 25.07.2011 16:58, schrieb Erick Erickson: Well, the attachment_1, attachment_2 idea would be awkward to form queries (i.e. there would be 100 clauses if there were 100 docs?) Dynamic fields have this same problem. Oh, yes .. correct .. overlooked that part :/ sorry.

Re: Schema Design/Data Import

2011-07-25 Thread Erick Erickson
Well, the attachment_1, attachment_2 idea would be awkward to form queries (i.e. there would be 100 clauses if there were 100 docs?) Dynamic fields have this same problem. You could certainly index them all into a big field, just make it multivalued and do a SolrDocument.add("bigtextfield", docCon

Re: Schema Design/Data Import

2011-07-25 Thread Stefan Matheis
Travis, that sounds like a perfect usecase for dynamic fields .. attachment_* and there you go. works for no attachment, as well as one, three or 50. for the user interface, you could iterate over them and show them as list - or something else that would fit your need. also, maybe, you woul

Re: Schema Design/Data Import

2011-07-25 Thread Travis Low
Thanks so much Erick (and Stefan). Yes, I did some reading on SolrJ and Tika and you are spot-on. We will write our own importer using SolrJ and then we can grab the DB records and parse any attachments along the way. Now it comes down to a schema design question. The issue I'm struggling with

Re: Schema Design/Data Import

2011-07-25 Thread Erick Erickson
I'd seriously consider going with SolrJ as your indexing strategy, it allows you to do anything you need to do in Java code. You can call the Tika library yourself on the files pointed to by your rows as you see fit, indexing them as you choose, perhaps one Solr doc per attachment, perhaps one per

Re: Schema design/data import

2011-07-21 Thread Stefan Matheis
Hey Travis, after reading your Mail .. and thinking a bit of it, i'm not sure if i would go with Nutch. Nutch is [from my understanding] more a crawler .. meant to crawl external / unknown sites. But, if it got this correct, you have a complete knowledge of your data and could solr exactly t

Schema Design/Data Import

2011-07-20 Thread travis
[Apologies if this is a duplicate -- I have sent several messages from my work email and they just vanish, so I subscribed with my personal email] Greetings. I am struggling to design a schema and a data import/update strategy for some semi-complicated data. I would appreciate any input. W

Schema design/data import

2011-07-20 Thread Travis Low
Greetings. I am struggling to design a schema and a data import/update strategy for some semi-complicated data. I would appreciate any input. What we have is a bunch of database records that may or may not have files attached. Sometimes no files, sometimes 50. The requirement is to index the d