Solr doesn't have the URL of the document. The document is given to Solr in an HTTP POST.
Solr is not a web spider, it is a search web service. wunder On 6/12/07 6:23 AM, "Ard Schrijvers" <[EMAIL PROTECTED]> wrote: > Hello Otis, > > thanks for the info. Would it a be an improvement to be able to specify in the > schema.xml wether or not the URI should be stored or not in a field which name > you can also specify in the schema? It might be very well possible that you do > not "own" the xml documents you index over http, and at the same time, you do > not want to store its contents in the index. Since at indexing time the uri is > known, adding it to the index is trivial. > > Regards Ard > > > > > You have to store the URI in a Field yourself. That means you need to define > that field in the schema and you have to set its value when adding documents. > > Otis > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . > Simpy -- http://www.simpy.com/ - Tag - Search - Share > > ----- Original Message ---- > From: Ard Schrijvers <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, June 12, 2007 9:02:25 AM > Subject: RE: storing the document URI in the index > > Hello Erik, > > thanks for the fast answer (sry for my mail not indenting but must use webmail > :-( ), but the problem I am facing is that I do not see solr storing the > location of the documents it indexed. So, I need to store the location of a > document in a field, but I do not see where solr would do this. Fetching the > document will be done with the simple cocoon generator, so that is no problem, > but of course, I need the url/uri to be in the index. I know I need it as a > UN_TOKENIZED STORED field, but just see with LUKE that the location is not > present in lucene index when solr "crawls" some directory with xml files, > > Regards Ard Schrijvers > > > Yes. Set the field to be store and non-indexed, field type "string" > is what I use. > >> Or is everybody used to storing the contents of a document in the >> lucene index (doesn't this imply a much larger index though?), so >> instead of retrieving the document's content through a seperate >> fetch over http/filesystem just show the result from the stored >> content field? > > This all depends on the needs of your project. Its perfectly fine to > store the text outside of the index, and that is the way it really > has to be done for very large indexes where as few fields as possible > are "stored". > > If you're also asking about Solr fetching the remote resource, that > is a different story altogether, and no it does not do that. [though > with the streaming capability you can feed in a document entirely > from a URL, but I haven't experimented with that feature yet myself] > > Erik > > > > > > > > > >