Dave: Oh, I agree that a DB is a perfectly valid place to store the data and you're absolutely right that it allows better interaction than flat files; you can ask questions of an RDBMS that you can't easily ask the disk ;). Storing to disk is an alternative if you're unwilling to deal with a DB is all.
But the main point is you'll change your schema sometime and have to re-index. Having the data you're indexing stored locally in whatever form will allow much faster turn-around rather than re-crawling. Of course it'll result in out of date data so you'll have to refresh somehow sometime. Erick On Tue, Feb 21, 2017 at 6:07 PM, Dave <hastings.recurs...@gmail.com> wrote: > Ha I think I went to one of your training seminars in NYC maybe 4 years ago > Eric. I'm going to have to respectfully disagree about the rdbms. It's such > a well know data format that you could hire a high school programmer to help > with the db end if you knew how to flatten it to solr. Besides it's easy to > visualize and interact with the data before it goes to solr. A Json/Nosql > format would work just as well, but I really think a database has its place > in a scenario like this > >> On Feb 21, 2017, at 8:20 PM, Erick Erickson <erickerick...@gmail.com> wrote: >> >> I'll add that I _guarantee_ you'll want to re-index the data as you >> change your schema >> and the like. You'll be able to do that much more quickly if the data >> is stored locally somehow. >> >> A RDBMS is not necessary however. You could simply store the data on >> disk in some format >> you could re-read and send to Solr. >> >> Best, >> Erick >> >>> On Tue, Feb 21, 2017 at 5:17 PM, Dave <hastings.recurs...@gmail.com> wrote: >>> B is a better option long term. Solr is meant for retrieving flat data, >>> fast, not hierarchical. That's what a database is for and trust me you >>> would rather have a real database on the end point. Each tool has a >>> purpose, solr can never replace a relational database, and a relational >>> database could not replace solr. Start with the slow model (database) for >>> control/display and enhance with the fast model (solr) for retrieval/search >>> >>> >>> >>>> On Feb 21, 2017, at 7:57 PM, Robert Hume <rhum...@gmail.com> wrote: >>>> >>>> To learn how to properly use Solr, I'm building a little experimental >>>> project with it to search for used car listings. >>>> >>>> Car listings appear on a variety of different places ... central places >>>> Craigslist and also many many individual Used Car dealership websites. >>>> >>>> I am wondering, should I: >>>> >>>> (a) deploy a Solr search engine and build individual indexers for every >>>> type of web site I want to find listings on? >>>> >>>> or >>>> >>>> (b) build my own database to store car listings, and then build services >>>> that scrape data from different sites and feed entries into the database; >>>> then point my Solr search to my database, one simple source of listings? >>>> >>>> My concerns are: >>>> >>>> With (a) ... I have to be smart enough to understand all those different >>>> data sources and remove/update listings when they change; while this be >>>> harder to do with custom Solr indexers than writing something from scratch? >>>> >>>> With (b) ... I'm maintaining a huge database of all my listings which seems >>>> redundant; google doesn't make a *copy* of everything on the internet, it >>>> just knows it's there. Is maintaining my own database a bad design? >>>> >>>> Thanks for reading!