Thanks a lot for you response !! For the first solution :
I need to index all the content of my websites and I want just tika ignore <meta name="id"> because I have already an id I'll try monday and tell you if it works The second solution : Are your sure Tika use the HTML Tokenizer ? I'll check 2009/12/5 Raghuveer Kancherla <raghuveer.kanche...@aplopio.com> > 2 ways I can think of ... > > - ExtractingRequestHandler (this is what I am guessing you are using now) > > Set extractOnly=true while making a request to the extractingRequestHandler > and get the parsed content back. Now make a post request on update request > handler with what ever fields and field values you want. > > > - Use HTMLStripWhiteSpaceTokenizer factory. This article may be helpful > to explain what I mean. > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripWhitespaceTokenizerFactory > . > > > > - Raghu > > > > On Sat, Dec 5, 2009 at 3:44 AM, khalid y <kern...@gmail.com> wrote: > > > Hi, > > > > I have a problem with solr. I'm indexing some html content and solr crash > > because my id field is multivalued. > > I found that Tika read the html and extract metadata like <meta name="id" > > content="12"> from my htmls but my documents has an already an id setted > by > > literal.id=10. > > > > I tried to map the id from Tika by fmap.id=ignored_ but it ignore also > my > > literal.id > > > > I'm using solr 1.4 and tika 0.5 > > > > Someone can explain to me how I can ignore this the Tika id metadata ?? > > > > Thanks > > >