OK, you're still not quite on the right track. You can't just index XML documents without transforming them into valid Solr XML documents. Ditto for HTML.
Take a look at the ExtractingRequestHandler documentation at: http://wiki.apache.org/solr/ExtractingRequestHandler Here's some more documentation that might help. http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika But at root, you have to extract the relevant info from the file in question and form your own valid Solr document and send *that* to Solr if you want to do it by hand. Or you can use the ExtractingRequestHandler to do it for you, but then you need to be aware that it'll do the best it can at putting meta-data information into the appropriate fields in your schema, but you don't have total control over that. Oh, and why are you using embedded Solr? The normal HTTP request process is recommended, which you can connect to easily with SolrJ.. FWIW Erick On Sun, Apr 3, 2011 at 6:48 PM, michael.i <michael.i...@gmail.com> wrote: > Hi Erick, > thanx for getting back to me. > > "Well, what is "a document on the filesystem"? Solr deals > with well-formed XML documents of a specific format." > > I would like to index all kinds of documents. For a start I'll be happy to > be able to work with xml and html documents. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Using-EmbeddedSolrServer-with-static-documents-tp2767614p2773012.html > Sent from the Solr - User mailing list archive at Nabble.com. >