Am 25.07.2011 16:58, schrieb Erick Erickson:
Well, the attachment_1, attachment_2 idea would be awkward
to form queries (i.e. there would be 100 clauses if there were 100 docs?)
Dynamic fields have this same problem.
Oh, yes .. correct .. overlooked that part :/ sorry.
Well, the attachment_1, attachment_2 idea would be awkward
to form queries (i.e. there would be 100 clauses if there were 100 docs?)
Dynamic fields have this same problem.
You could certainly index them all into a big field, just make it
multivalued and do a SolrDocument.add("bigtextfield", docCon
Travis,
that sounds like a perfect usecase for dynamic fields .. attachment_*
and there you go. works for no attachment, as well as one, three or 50.
for the user interface, you could iterate over them and show them as
list - or something else that would fit your need.
also, maybe, you woul
Thanks so much Erick (and Stefan). Yes, I did some reading on SolrJ and
Tika and you are spot-on. We will write our own importer using SolrJ and
then we can grab the DB records and parse any attachments along the way.
Now it comes down to a schema design question. The issue I'm struggling
with
I'd seriously consider going with SolrJ as your indexing strategy, it allows
you to do anything you need to do in Java code. You can call the Tika
library yourself on the files pointed to by your rows as you see fit, indexing
them as you choose, perhaps one Solr doc per attachment, perhaps one
per
Hey Travis,
after reading your Mail .. and thinking a bit of it, i'm not sure if i
would go with Nutch. Nutch is [from my understanding] more a crawler ..
meant to crawl external / unknown sites.
But, if it got this correct, you have a complete knowledge of your data
and could solr exactly t
[Apologies if this is a duplicate -- I have sent several messages from my work
email and they just vanish, so I subscribed with my personal email]
Greetings. I am struggling to design a schema and a data import/update
strategy for some semi-complicated data. I would appreciate any input.
W
Greetings. I am struggling to design a schema and a data import/update
strategy for some semi-complicated data. I would appreciate any input.
What we have is a bunch of database records that may or may not have files
attached. Sometimes no files, sometimes 50.
The requirement is to index the d