OK, see below. On Wed, Apr 6, 2011 at 6:22 PM, Preston Marshall <pres...@synergyeoc.com>wrote:
> Reply Inline: > On Apr 6, 2011, at 8:12 AM, Erick Erickson wrote: > > > Hmmm, this should work just fine. Here are my questions. > > > > 1> are you absolutely sure that the new synonym file > > is available when reindexing? > Not sure what you mean here, solr is running as root, and the file is never > moved around or anything crazy. > Just a sanity check that you're changing the indexing file you think you're changing. I've sometimes managed to be in the wrong directory, on the wrong machine, etc. Hmmm, what happens if you just stop/start the server instead of delete the index? I'm wondering if the old file is used (assuming *nix here). I have no evidence this could be the case but it's an idea. > > 2> does the sunspot program do anything wonky with > > the ids? The documents > > will only be replaced if the IDs are identical. > Is there a way I can add debugging to show what it's doing with the IDs or > something to view the index? I tried using Luke, but I can't get it to > actually show me the actual data of the objects, only the name and some > other basic info. > The issue is seeing whatever has been defined as the <uniqueKey> field. In the default schema, it's defined as "id". I'm NOT talking about the internal Lucene ID, it's entirely about what's defined in your schema. Set stored="true" for fields to see them easily. The point here is that Solr updates documents based on <uniqueKey>. If there is no such field, reindexing your documents will simply add another copy, the original is still searchable. > > 3> are you sure that a commit is done at the end? > It appears that it commits a few times during reindexing. > > 4> What happens if you optimize? At that point, maxdocs > > and numdocs should be the same, and should be the count > > of documents. if they differ by a factor of 2, I'd suspect your > > id field isn't being used correctly. > I'm unaware of what you mean by optimizing, or even viewing maxdocs and > numdocs, but I will RTFM to find out. I did notice something strange > earlier though that may relate to this. When I ran a search there were > duplicate results. > OK, see the <uniqueKey> discussion above. It really sounds like re-indexing the data is merely adding documents again and again and again, not replacing the first copy with the second. If this is true, your numDocs and maxDocs should be nearly equal the first time and grow by the number of documents you index every time you reindex. If/when you <uniqueKey> is working, you should see numDocs stay constant and maxDocs go up by the number of documents you re-index. Sending an optimize command to the indexer will reclaim all unused resources and bring numDocs and maxDocs back to the same value, but this is probably not your problem. I do see that "id" is the <uniqueKey> in your schema. So I'm guessing, especially because the comment says that this field is used by sunspot, that the sunspot stuff is creating a new id for each document when you re-index. If all this is true, it's an issue with sunspot. So here's what I predict. If you look at the id field you'll see some sunspot-generated id that's unique for every added document even if it's a new copy of an old document, so Solr sees two separate, entirely unrelated documents. The old one has the old synonyms and the new one the new list. The maxDocs/numDocs are available on the admin page, click the "statistics" link. Best Erick > > > > If the hypothesis that you id field isn't working correctly, your number > > of hits should be going up after re-indexing... > > > > If none of that is relevant, let us know what you find and we'll > > try something else.... > > > > Best > > Erick > > > > On Tue, Apr 5, 2011 at 10:46 PM, Preston Marshall < > pres...@synergyeoc.com>wrote: > > > >> Hello all, I am having an issue with Solr and the SynonymFilterFactory. > I > >> am using a library to interface with Solr called "sunspot." I realize > that > >> is not what this list is for, but I believe this may be an issue with > Solr, > >> not the library (plus the lib author doesn't know the answer). I am > using > >> the SynonymFilterFactory in my index-time analyzer, and it works great. > My > >> only problem is when it comes to changing the synonyms file. I would > expect > >> to be able to edit the file, run a reindex (this is through the > library), > >> and have the new synonyms function when the reindex is complete. > >> Unfortunately this is not the case, as changing the synonyms file > doesn't > >> actually affect the search results. What DOES work is deleting the > existing > >> index, and starting from scratch. This is unacceptable for my usage > though, > >> because I need the old index to remain online while the new one is being > >> built, so there is no downtime. > >> > >> Here's my schema in case anyone needs it: > >> https://gist.github.com/88f8fb763e99abe4d5b8 > >> > >> Thanks, > >> Preston > >> > >> P.S. Sorry if this dupes, first post and I didn't see it show up in the > >> archives. > >> > >