More precisely, remnant terms from deleted documents slowly disappear as you add new documents or when you optimize the index.
On Thu, Feb 18, 2010 at 11:09 AM, darniz <rnizamud...@edmunds.com> wrote: > > Thanks > If this is really the case, i declared a new filed called mySpellTextDup and > retired the original field. > Now i have a new field which powers my dictionary with no words in it and > now i am free to index which ever term i want. > > This is not the best of solution but i cant think of a reasonable workaround > > Thanks > darniz > > > Lance Norskog-2 wrote: >> >> This is a quirk of Lucene - when you delete a document, the indexed >> terms for the document are not deleted. That is, if 2 documents have >> the word 'frampton' in an indexed field, the term dictionary contains >> the entry 'frampton' and pointers to those two documents. When you >> delete those two documents, the index contains the entry 'frampton' >> with an empty list of pointers. So, the terms are still there even >> when you delete all of the documents. >> >> Facets and the spellchecking dictionary build from this term >> dictionary, not from the text string that are 'stored' and returned >> when you search for the documents. >> >> The <optimize> command throws away these remnant terms. >> >> http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/ >> >> On Wed, Feb 17, 2010 at 12:24 PM, darniz <rnizamud...@edmunds.com> wrote: >>> >>> Please bear with me on the limitted understanding. >>> i deleted all documents and i made a rebuild of my spell checker using >>> the >>> command >>> spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default >>> >>> After this i went to the schema browser and i saw that mySpellText still >>> has >>> around 2000 values. >>> How can i make sure that i clean up that field. >>> We had the same issue with facets too, even though we delete all the >>> documents, and if we do a facet on make we still see facets but we can >>> filter out facets by saying facet.mincount>0. >>> >>> Again coming back to my question how can i make mySpellText fields get >>> rid >>> of all previous terms >>> >>> Thanks a lot >>> darniz >>> >>> >>> >>> hossman wrote: >>>> >>>> : But still i cant stop thinking about this. >>>> : i deleted my entire index and now i have 0 documents. >>>> : >>>> : Now if i make a query with accrd i still get a suggestion of accord >>>> even >>>> : though there are no document returned since i deleted my entire index. >>>> i >>>> : hope it also clear the spell check index field. >>>> >>>> there are two Lucene indexes when you use spell checking. >>>> >>>> there is the "main" index which is goverend by your schema.xml and is >>>> what >>>> you add your own documents to, and what searches are run agains for the >>>> result section of solr responses. >>>> >>>> There is also the "spell" index which has only two fields and in >>>> which each "document" corrisponds to a "word" that might be returend as >>>> a >>>> spelling suggestion, and the other fields contain various >>>> start/end/middle >>>> ngrams that represent possible misspellings. >>>> >>>> When you use the spellchecker component it builds the "spell" index >>>> makinga document out of every word it finds in whatever field name you >>>> configure it to use. >>>> >>>> deleting your entire "main" index won't automaticly delete the "spell" >>>> index (allthough you should be able rebuild the "spell" index using the >>>> *empty* "main" index, that should work). >>>> >>>> : i am copying both fields to a field called >>>> : <copyField source="make" dest="mySpellText"/> >>>> : <copyField source="model" dest="mySpellText"/> >>>> >>>> ..at this point your "main" index has a field named mySpellText, and for >>>> ever document it contains a copy of make and model. >>>> >>>> : <lst name="spellchecker"> >>>> : <str name="name">default</str> >>>> : <str name="field">mySpellText</str> >>>> : <str name="buildOnOptimize">true</str> >>>> : <str name="buildOnCommit">true</str> >>>> >>>> ...so whenever you commit or optimize your "main" index it will take >>>> every >>>> word from the mySpellText and use them all as individual documents in >>>> the >>>> "spell" index. >>>> >>>> In your previous email you said you changed hte copyField declaration, >>>> and >>>> then triggered a commit -- that rebuilt your "spell" index, but the data >>>> was still all there in the mySpellText field of the "main" index, so the >>>> rebuilt "spell" index was exactly the same. >>>> >>>> : i have buildOnOPtmize and buildOnCommit as true so when i index new >>>> document >>>> : i want my dictionary to be created but how can i make sure i remove >>>> the >>>> : preivious indexed terms. >>>> >>>> everytime the spellchecker component "builds" it will create a >>>> completley >>>> new "spell" index .. but if the old data is still in the "main" index >>>> then >>>> it will also be in the "spell" index. >>>> >>>> The only reason i can think of why you'd be seeing words in your "spell" >>>> index after deleting documents from your "main" index is that even if >>>> you >>>> delete documents, the Terms are still there in the underlying index >>>> untill >>>> the segments are merged ... so if you do an optimize that will force >>>> them >>>> to be expunged --- but i honestly have no idea if that is what's causing >>>> your problem, because quite frankly i really don't understand what your >>>> problem is ... you have to provide specifics: reproducible steps anyone >>>> can take using a clean install of solr to see the the behavior you are >>>> seeing that seems incorrect. (ie: modifications to the example schema, >>>> and commands to execute against hte demo port to see the bug) >>>> >>>> if you can provide details like that then it's possible to understand >>>> what >>>> is going wrong for you -- which is a prereq to providing useful help. >>>> >>>> >>>> >>>> -Hoss >>>> >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27629740.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> >> > > -- > View this message in context: > http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27644054.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lance Norskog goks...@gmail.com