: I know I was able to imitate that in plain-lucene by crafting a particular
: analyzer-filter who was only given the URL as content and who gave further the
: tokens of the stream.

FWIW: while taking advantage of DIH and some of it's plugin APIs to deal 
with this is probaly a better way to -- anything you could do in a 
TokenFilter with a homegrown Lucene app can also be done in a TokenFilter 
in Solr -- all you need is a simple TokenFilterFactory to initialize your 
TokenFilter.

>From a purist standpoint: the decision about where to hook in a feature 
like this depends on the mental model you have of your index vs the 
differnet ways you can get data into your index.  if every document should 
have an "extendedText" field, and docs you post via xml or csv will have 
thta field verbatim, but documents you index using DIH will get it by 
fetching a URL, then a DIH plugin is the way to go -- if you want every 
client sending you docs to provide a URL and you *always* fetch that URL 
to get the content, then a TokenFilter is hte way to go.




-Hoss

Reply via email to