I am assuming (like Li I think) that you want to induce a structure/schema from a html-example so you can use that schema to extract data from similiar html-structured pages.
Another term often used in literature for that is "Wrapper Induction". Beside DOM, using CSS-classes often give good distinction and they are often more stable under small redesigns. Besides Li's suggestions have a look at this thread for an open source python implementation (I hav enever tested it) http://www.holovaty.com/writing/templatemaker/ also make sure to read all the comments for links to other products, etc. HTH, Geert-Jan 2010/7/25 Li Li <fancye...@gmail.com> > it's not a related topic in solr. maybe you should read some papers > about wrapper generation or automatical web data extraction. If you > want to generate xpath, you could possibly read liubing's papers such > as "Structured Data Extraction from the Web based on Partial Tree > Alignment". Besides dom tree, visual clues also may be used. But none > of them will be perfect solution because of the diversity of web > pages. > > 2010/7/25 Savannah Beckett <savannah_becket...@yahoo.com>: > > Hi, > > I am looking for a XPath generator that can generate xpath by picking a > > specific tag inside a html. Do you know a good xpath generator? If > possible, > > free xpath generator would be great. > > Thanks. > > > > > > >