Can you provide a few more details? You mention xpath, which leads me to believe that you are using DIH, is that true? How are you getting your documents to index? Parts of a filesystem?
Because it's possible to do many things. If you're using DIH against a filesystem, you could use two fileDataSources, one that works only on files with a particular extension (xml, say) and another that processes .txt files. But that said, if you're trying to index "just the text" of a Word document, you have to parse it quite differently than a plain text file, take a look at Tika. Al of which may not help you at all, because I'm guessing... So I think a more complete problem statement would help us help you. Best Erick On Wed, Sep 29, 2010 at 3:56 PM, Savannah Beckett < savannah_becket...@yahoo.com> wrote: > Hi, > I am using xpath to index different parts of the html pages into > different > fields. Now, I have some pure text documents that has no html. So I can't > use > xpath. How do I index these pure text into different fields of the index? > How > do I make nutch/solr understand these different parts belong to different > fields? Maybe I can use existing content in the fields in my index? > Thanks. > > >