On Fri, Jan 23, 2009 at 2:55 PM, Paul Libbrecht <p...@activemath.org> wrote: > > Le 23-janv.-09 à 10:10, Noble Paul നോബിള് नोब्ळ् a écrit : >> >> if the response is not XML ,then there is no EntityProcessor that can >> consume this. We may need to add one. > > well, even binary data such as word documents (base64-encoded for example) > run the risk of appearing here. They sure need a pile of filters! > >>> What bothers me with the HttpDataSource example is that, for now, at >>> least, >>> it is configured to pull a single URL while what is needed (and would >>> provide delta ability) is really to index a list of URLs (for which one >>> would pull regularly the list of recently update URLs or simply use >>> GET-if-modified-since on all of them). >> >> The if-modified since is not supported by HttpdataSource. However you >> can write a transformer which pings the URL w/ a if-modified-since >> header an skip the document using the $skipDoc option > > I still don't understand how you give several documents to the > HttpDataSource. > The configuration seems only to allow a single URL. > Am I missing something? The DataSource is like a helper class. The only intelligent piece here is an EntityProcessor. > > paul > > PS: would it be worth chatting about that on irc.freenode.net#solr ?
-- --Noble Paul