Handling disparate data sources in Solr

2006-12-22 Thread Alan Burlison
and "positionIncrementGap" mean in the schema.xml file? The documentation is vague to say the least, and google wasn't much more helpful. Thanks, -- Alan Burlison --

Re: Handling disparate data sources in Solr

2006-12-23 Thread Alan Burlison
as in most cases it is set to 100, in fact a number of 5 or so would be plenty, is that correct? In fact, isn't it more-ore-less a boolean switch? -- Alan Burlison --

Re: Handling disparate data sources in Solr

2006-12-23 Thread Alan Burlison
lk to solr. In that case there's little point in using Solr at all - the main benefit it gives me is that I don't have to write all the HTTP protocol bits. If I have to do that myself I might as well use raw Luceme - and in fact that's how the existing system works. -- Alan Burlison --

Re: Handling disparate data sources in Solr

2006-12-23 Thread Alan Burlison
quot;Web Framework". I'm trying to simplify things, not add 90% clutter for 10% functionality. -- Alan Burlison --

Re: Handling disparate data sources in Solr

2006-12-24 Thread Alan Burlison
ary PDF file and parse it into it's appropriate fields ... but we aren't quite there yet. Feel free to bring this up on solr-dev if you'd be interested in working on it. Hmm. That's a possibility. It all depends on the time tradeoff between fixing what we have already to make it reusable versus extending Solr. -- Alan Burlison --

Re: Handling disparate data sources in Solr

2007-01-04 Thread Alan Burlison
ted in working on it. I'm interested in discussing this further. I've moved the discussion onto solr-dev, as suggested. -- Alan Burlison --

Re: Handling disparate data sources in Solr

2007-01-04 Thread Alan Burlison
ments that can be accessed over HTTP, instead of embedding them in the indexing request. The indexer would fetch the document using the specified URL. There would then be entries in the configuration file that map each MIME type to a handler that is capable of dealing with that document type. Thoughts? -- Alan Burlison --

Re: Handling disparate data sources in Solr

2007-01-08 Thread Alan Burlison
index insert/update request - the aim is to merely prevent the bloat caused by encoding the document (e.g. as base64) when the indexer can access the source document directly. -- Alan Burlison --

Re: Handling disparate data sources in Solr

2007-01-08 Thread Alan Burlison
the href would usually start "file://", not "http://"; BTW, this discussion is also occurring on solr-dev, it might be better to move all of it over there ;-) -- Alan Burlison --