Hey Alex, Thanks for the prompt response.
Here is what I am trying to solve: I am showing search results from content coming from 3 different places on a single site. And, I have done that by pumping all this content to Solr server running on single flat schema by using different APIs of these platforms. Now, I need to index blog posts written in word press also. I was wondering if there is any solution already availablw which can help me crawl and pump this posst to my running solr instance. Otherwise I might have to write few more scripts to do that. BTW, Is Swift using Solr on the backend? Because I thought its a paid enterprise solution. *Vishal Sharma**TL, Grazitti Interactive*T: +1 650 641 1754 E: vish...@grazitti.com www.grazitti.com [image: Description: LinkedIn] <http://www.linkedin.com/company/grazitti-interactive>[image: Description: Twitter] <https://twitter.com/grazitti>[image: fbook] <https://www.facebook.com/grazitti.interactive>*dreamforce®*Oct 13-16, 2014 *Meet us at the Cloud Expo* Booth N2341 Moscone North, San Francisco Schedule a Meeting <http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule> | Follow us <https://twitter.com/grazitti>ZakCalendar Dreamforce® Featured App <https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3> On Tue, Oct 7, 2014 at 11:21 AM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > On 7 October 2014 14:08, Vishal Sharma <vish...@grazitti.com> wrote: > > Hi, > > > > I am trying to get some help on finding out if there is any best practice > > to index wordpress blogs in solr index? Can someone help with > architecture > > I shoudl be setting up? > > > > Do, I need to write separate scripts to crawl wordpress and then pump > posts > > back to Solr using its API? > > > Is your goal WordPress indexing or specifically indexing into Solr. > Because there are services such as: > https://wordpress.org/plugins/swiftype-search/ > > Otherwise, the question is the level of access you have to the > WordPress. You could index feeds WordPress produces (there is an > example in the distribution for RSS parsing). Or you could pull it > directly from the database. Or - if the real-time is not important, > you could periodically do WordPress export (to XML) and parse that. > > I would NOT parse the HTML and try to recreate that. > > As to the rest of the architecture, you need to know whether you are > just indexing generic WordPress or also extensions such as custom > taxonomies, custom values, etc. > > These are all important questions because they will drive the Solr > architecture more than the original question you seem to be asking. > > Regards, > Alex. > > Personal: http://www.outerthoughts.com/ and @arafalov > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 >