Re: Deleting spelll checker index

darniz Thu, 18 Feb 2010 11:10:06 -0800

Thanks
If this is really the case, i declared a new filed called mySpellTextDup and
retired the original field.
Now i have a new field which powers my dictionary with no words in it and 
now i am free to index which ever term i want.


This is not the best of solution but i cant think of a reasonable workaround

Thanks
darniz


Lance Norskog-2 wrote:
> 
> This is a quirk of Lucene - when you delete a document, the indexed
> terms for the document are not deleted. That is, if 2 documents have
> the word 'frampton' in an indexed field, the term dictionary contains
> the entry 'frampton' and pointers to those two documents. When you
> delete those two documents, the index contains the entry 'frampton'
> with an empty list of pointers. So, the terms are still there even
> when you delete all of the documents.
> 
> Facets and the spellchecking dictionary build from this term
> dictionary, not from the text string that are 'stored' and returned
> when you search for the documents.
> 
> The <optimize> command throws away these remnant terms.
> 
> http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/
> 
> On Wed, Feb 17, 2010 at 12:24 PM, darniz <rnizamud...@edmunds.com> wrote:
>>
>> Please bear with me on the limitted understanding.
>> i deleted all documents and i made a rebuild of my spell checker  using
>> the
>> command
>> spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default
>>
>> After this i went to the schema browser and i saw that mySpellText still
>> has
>> around 2000 values.
>> How can i make sure that i clean up that field.
>> We had the same issue with facets too, even though we delete all the
>> documents, and if we do a facet on make we still see facets but we can
>> filter out facets by saying facet.mincount>0.
>>
>> Again coming back to my question how can i make mySpellText fields get
>> rid
>> of all previous terms
>>
>> Thanks a lot
>> darniz
>>
>>
>>
>> hossman wrote:
>>>
>>> : But still i cant stop thinking about this.
>>> : i deleted my entire index and now i have 0 documents.
>>> :
>>> : Now if i make a query with accrd i still get a suggestion of accord
>>> even
>>> : though there are no document returned since i deleted my entire index.
>>> i
>>> : hope it also clear the spell check index field.
>>>
>>> there are two Lucene indexes when you use spell checking.
>>>
>>> there is the "main" index which is goverend by your schema.xml and is
>>> what
>>> you add your own documents to, and what searches are run agains for the
>>> result section of solr responses.
>>>
>>> There is also the "spell" index which has only two fields and in
>>> which each "document" corrisponds to a "word" that might be returend as
>>> a
>>> spelling suggestion, and the other fields contain various
>>> start/end/middle
>>> ngrams that represent possible misspellings.
>>>
>>> When you use the spellchecker component it builds the "spell" index
>>> makinga document out of every word it finds in whatever field name you
>>> configure it to use.
>>>
>>> deleting your entire "main" index won't automaticly delete the "spell"
>>> index (allthough you should be able rebuild the "spell" index using the
>>> *empty* "main" index, that should work).
>>>
>>> : i am copying both fields to a field called
>>> : <copyField source="make" dest="mySpellText"/>
>>> : <copyField source="model" dest="mySpellText"/>
>>>
>>> ..at this point your "main" index has a field named mySpellText, and for
>>> ever document it contains a copy of make and model.
>>>
>>> :         <lst name="spellchecker">
>>> :             <str name="name">default</str>
>>> :             <str name="field">mySpellText</str>
>>> :             <str name="buildOnOptimize">true</str>
>>> :             <str name="buildOnCommit">true</str>
>>>
>>> ...so whenever you commit or optimize your "main" index it will take
>>> every
>>> word from the mySpellText and use them all as individual documents in
>>> the
>>> "spell" index.
>>>
>>> In your previous email you said you changed hte copyField declaration,
>>> and
>>> then triggered a commit -- that rebuilt your "spell" index, but the data
>>> was still all there in the mySpellText field of the "main" index, so the
>>> rebuilt "spell" index was exactly the same.
>>>
>>> : i have buildOnOPtmize and buildOnCommit as true so when i index new
>>> document
>>> : i want my dictionary to be created but how can i make sure i remove
>>> the
>>> : preivious indexed terms.
>>>
>>> everytime the spellchecker component "builds" it will create a
>>> completley
>>> new "spell" index .. but if the old data is still in the "main" index
>>> then
>>> it will also be in the "spell" index.
>>>
>>> The only reason i can think of why you'd be seeing words in your "spell"
>>> index after deleting documents from your "main" index is that even if
>>> you
>>> delete documents, the Terms are still there in the underlying index
>>> untill
>>> the segments are merged ... so if you do an optimize that will force
>>> them
>>> to be expunged --- but i honestly have no idea if that is what's causing
>>> your problem, because quite frankly i really don't understand what your
>>> problem is ... you have to provide specifics: reproducible steps anyone
>>> can take using a clean install of solr to see the the behavior you are
>>> seeing that seems incorrect.  (ie: modifications to the example schema,
>>> and commands to execute against hte demo port to see the bug)
>>>
>>> if you can provide details like that then it's possible to understand
>>> what
>>> is going wrong for you -- which is a prereq to providing useful help.
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27629740.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27644054.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deleting spelll checker index

Reply via email to