: I know I was able to imitate that in plain-lucene by crafting a particular : analyzer-filter who was only given the URL as content and who gave further the : tokens of the stream.
FWIW: while taking advantage of DIH and some of it's plugin APIs to deal with this is probaly a better way to -- anything you could do in a TokenFilter with a homegrown Lucene app can also be done in a TokenFilter in Solr -- all you need is a simple TokenFilterFactory to initialize your TokenFilter. >From a purist standpoint: the decision about where to hook in a feature like this depends on the mental model you have of your index vs the differnet ways you can get data into your index. if every document should have an "extendedText" field, and docs you post via xml or csv will have thta field verbatim, but documents you index using DIH will get it by fetching a URL, then a DIH plugin is the way to go -- if you want every client sending you docs to provide a URL and you *always* fetch that URL to get the content, then a TokenFilter is hte way to go. -Hoss