More precisely, remnant terms from deleted documents slowly disappear
as you add new documents or when you optimize the index.

On Thu, Feb 18, 2010 at 11:09 AM, darniz <rnizamud...@edmunds.com> wrote:
>
> Thanks
> If this is really the case, i declared a new filed called mySpellTextDup and
> retired the original field.
> Now i have a new field which powers my dictionary with no words in it and
> now i am free to index which ever term i want.
>
> This is not the best of solution but i cant think of a reasonable workaround
>
> Thanks
> darniz
>
>
> Lance Norskog-2 wrote:
>>
>> This is a quirk of Lucene - when you delete a document, the indexed
>> terms for the document are not deleted. That is, if 2 documents have
>> the word 'frampton' in an indexed field, the term dictionary contains
>> the entry 'frampton' and pointers to those two documents. When you
>> delete those two documents, the index contains the entry 'frampton'
>> with an empty list of pointers. So, the terms are still there even
>> when you delete all of the documents.
>>
>> Facets and the spellchecking dictionary build from this term
>> dictionary, not from the text string that are 'stored' and returned
>> when you search for the documents.
>>
>> The <optimize> command throws away these remnant terms.
>>
>> http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/
>>
>> On Wed, Feb 17, 2010 at 12:24 PM, darniz <rnizamud...@edmunds.com> wrote:
>>>
>>> Please bear with me on the limitted understanding.
>>> i deleted all documents and i made a rebuild of my spell checker  using
>>> the
>>> command
>>> spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default
>>>
>>> After this i went to the schema browser and i saw that mySpellText still
>>> has
>>> around 2000 values.
>>> How can i make sure that i clean up that field.
>>> We had the same issue with facets too, even though we delete all the
>>> documents, and if we do a facet on make we still see facets but we can
>>> filter out facets by saying facet.mincount>0.
>>>
>>> Again coming back to my question how can i make mySpellText fields get
>>> rid
>>> of all previous terms
>>>
>>> Thanks a lot
>>> darniz
>>>
>>>
>>>
>>> hossman wrote:
>>>>
>>>> : But still i cant stop thinking about this.
>>>> : i deleted my entire index and now i have 0 documents.
>>>> :
>>>> : Now if i make a query with accrd i still get a suggestion of accord
>>>> even
>>>> : though there are no document returned since i deleted my entire index.
>>>> i
>>>> : hope it also clear the spell check index field.
>>>>
>>>> there are two Lucene indexes when you use spell checking.
>>>>
>>>> there is the "main" index which is goverend by your schema.xml and is
>>>> what
>>>> you add your own documents to, and what searches are run agains for the
>>>> result section of solr responses.
>>>>
>>>> There is also the "spell" index which has only two fields and in
>>>> which each "document" corrisponds to a "word" that might be returend as
>>>> a
>>>> spelling suggestion, and the other fields contain various
>>>> start/end/middle
>>>> ngrams that represent possible misspellings.
>>>>
>>>> When you use the spellchecker component it builds the "spell" index
>>>> makinga document out of every word it finds in whatever field name you
>>>> configure it to use.
>>>>
>>>> deleting your entire "main" index won't automaticly delete the "spell"
>>>> index (allthough you should be able rebuild the "spell" index using the
>>>> *empty* "main" index, that should work).
>>>>
>>>> : i am copying both fields to a field called
>>>> : <copyField source="make" dest="mySpellText"/>
>>>> : <copyField source="model" dest="mySpellText"/>
>>>>
>>>> ..at this point your "main" index has a field named mySpellText, and for
>>>> ever document it contains a copy of make and model.
>>>>
>>>> :         <lst name="spellchecker">
>>>> :             <str name="name">default</str>
>>>> :             <str name="field">mySpellText</str>
>>>> :             <str name="buildOnOptimize">true</str>
>>>> :             <str name="buildOnCommit">true</str>
>>>>
>>>> ...so whenever you commit or optimize your "main" index it will take
>>>> every
>>>> word from the mySpellText and use them all as individual documents in
>>>> the
>>>> "spell" index.
>>>>
>>>> In your previous email you said you changed hte copyField declaration,
>>>> and
>>>> then triggered a commit -- that rebuilt your "spell" index, but the data
>>>> was still all there in the mySpellText field of the "main" index, so the
>>>> rebuilt "spell" index was exactly the same.
>>>>
>>>> : i have buildOnOPtmize and buildOnCommit as true so when i index new
>>>> document
>>>> : i want my dictionary to be created but how can i make sure i remove
>>>> the
>>>> : preivious indexed terms.
>>>>
>>>> everytime the spellchecker component "builds" it will create a
>>>> completley
>>>> new "spell" index .. but if the old data is still in the "main" index
>>>> then
>>>> it will also be in the "spell" index.
>>>>
>>>> The only reason i can think of why you'd be seeing words in your "spell"
>>>> index after deleting documents from your "main" index is that even if
>>>> you
>>>> delete documents, the Terms are still there in the underlying index
>>>> untill
>>>> the segments are merged ... so if you do an optimize that will force
>>>> them
>>>> to be expunged --- but i honestly have no idea if that is what's causing
>>>> your problem, because quite frankly i really don't understand what your
>>>> problem is ... you have to provide specifics: reproducible steps anyone
>>>> can take using a clean install of solr to see the the behavior you are
>>>> seeing that seems incorrect.  (ie: modifications to the example schema,
>>>> and commands to execute against hte demo port to see the bug)
>>>>
>>>> if you can provide details like that then it's possible to understand
>>>> what
>>>> is going wrong for you -- which is a prereq to providing useful help.
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27629740.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27644054.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to