Alfresco has spent ten+ years building a content management system that
follows this basic design:
1) Original bytes (PDF, Word Doc, image file) are stored in a filesystem
based content store.
2) Meta-data is stored in a relational database, normalized.
3) Content is transformed to text and meta-d
I would possibly extend this a bit futher. There is the source, then the
'normalized' version of the data, then the indexed version.
Sometimes you realize you miss something in the normalized view and you
have to go back to the actual source.
This will be as likely as there are number of sources
Reindexing is exactly why you want the Single Source of Truth to be in a
repository outside of Solr.
For our slowly-changing data sets, we have an intermediate JSONL batch. That is
created from the source repositories and saved in Amazon S3. Then we load it
into Solr nightly. That allows us to
Dave:
Oh, I agree that a DB is a perfectly valid place to store the data and
you're absolutely right that it allows better interaction than flat
files; you can ask questions of an RDBMS that you can't easily ask the
disk ;). Storing to disk is an alternative if you're unwilling to deal
with a DB i
Ha I think I went to one of your training seminars in NYC maybe 4 years ago
Eric. I'm going to have to respectfully disagree about the rdbms. It's such a
well know data format that you could hire a high school programmer to help with
the db end if you knew how to flatten it to solr. Besides it'
Awesome advice. flat=fast in Solr.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Feb 21, 2017, at 5:17 PM, Dave wrote:
>
> B is a better option long term. Solr is meant for retrieving flat data, fast,
> not hierarchical. That's what a database i
I'll add that I _guarantee_ you'll want to re-index the data as you
change your schema
and the like. You'll be able to do that much more quickly if the data
is stored locally somehow.
A RDBMS is not necessary however. You could simply store the data on
disk in some format
you could re-read and sen
Thanks for that! I was thinking (B) too, but wanted guidance that I'm
using the tool correctly.
Am still interested in hearing opinions from others, thanks!
rh
On Tue, Feb 21, 2017 at 8:17 PM, Dave wrote:
> B is a better option long term. Solr is meant for retrieving flat data,
> fast, not hi
And not to sound redundant but if you ever need help, database programmers are
a dime a dozen, good luck finding solr developers that are available freelance
for a price you're willing to pay. If you can do the solr anyone else that does
web dev can do the sql
> On Feb 21, 2017, at 8:17 PM, Dav
B is a better option long term. Solr is meant for retrieving flat data, fast,
not hierarchical. That's what a database is for and trust me you would rather
have a real database on the end point. Each tool has a purpose, solr can never
replace a relational database, and a relational database cou
To learn how to properly use Solr, I'm building a little experimental
project with it to search for used car listings.
Car listings appear on a variety of different places ... central places
Craigslist and also many many individual Used Car dealership websites.
I am wondering, should I:
(a) depl
11 matches
Mail list logo