Thank you Mary. You are right about the punctuation creating the problem. Words and stems without punctuation work just fine with my new custom dictionary.
We have a long list of industry-specific abbreviations and synonyms. I am experimenting with using stemming (instead of a thesaurus) to make searches return the same results regardless of whether the user searches for the full word or the abbreviation. Most of these abbreviations contain punctuation (either a period or an apostrophe). All these values are in the same field. So I'll take your advice and investigate creating a custom tokenization for that field. Thank you, David -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Mary Holstege Sent: Wednesday, July 22, 2015 11:57 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Custom dictionary for stemming It may be a tokenization thing -- the apostrophe is causing a word break so your custom stem is never matched. What does this give you: cts:tokenize(cts:stem("Int'l"))? Do things work as you expect for a custom stem that doesn't have a punctuation character in it? A workaround for that is to create a field custom tokenization override making apostrophe a word character. That will be confined to that specific field, however, and not to word queries in general. Regardless, you should probably report a bug to ML support. //Mary On Wed, 22 Jul 2015 08:02:33 -0700, Rhodes, David (LNG-CON) <[email protected]> wrote: > I am trying to use a custom dictionary to extend the set of stemmed > words. > > I am using MarkLogic 7.0, and have been following the documentation > guides in Chapters 17 and 18: > http://docs.marklogic.com/7.0/guide/search-dev/stemming > http://docs.marklogic.com/7.0/guide/search-dev/custom-dictionaries > > I noted that there are two ways to see if words are resolving to their > stems: > > cts:stem(word) returns the stems of word > > and > > cts:contains(word, stem) returns true if these two terms resolve to > the same stem > > I confirmed that both of these work for terms that are in the default > dictionary (e.g., run and running, bite and bitten) > > I have added a custom dictionary that adds "Int'l" as a word with > "International" as its stem. > > cdict:dictionary-write("en",$dict) > > With that dictionary added as the custom dictionary for English, > cts:stem works but cts:contains does not. > cts:stem("Int'l") returns International cts:contains("Int'l", > "International") returns false > > I reindexed my database, since I understand that my dictionary entry > means that all documents containing "Int'l" should now be indexed > under "International". > > cts:contains("Int'l", "International") still returns false > Furthermore, in the real search work flow that I am doing, searches > for "Int'l" do not return documents containing "International" (But > searches for "bitten" do return documents containing "bite"). > > My database indexes are set to Stemmed Searches = Basic, and Word > Searches = False. > > I think that stemming can be a powerful feature for my work flow, if I > can just get it to work. Thank you for any advice you can offer. > > David -- Using Opera's revolutionary email client: http://www.opera.com/mail/ _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
