One downside of doing joins is that it makes it pretty hard to distribute/federate in the future because a document doesn't stand alone.
A flat structure for tagging could be to add a taguser and tag field to the actual document each time a user tagged a document.
- all collected objects
facet.query=tag:*
- all objects collected by erikhatcher
facet.query=taguser:erikhatcher
- all collected objects with tag "foo"
facet.query=tag:foo - facet by tag facet.field=tag - filter query resultst by a constraint tag=foo fq=tag:foo You wouldn't be able to query for: - total number of tags - items with the largest number of tags - a tag by a specific user... that would require something like a phrase match across fields. Downsides of a flat structure: - you need to reindex the whole document, or have updateable documents - even with updateable documents, it could be costly to update (if people's tagging rate is fairly low, this may not matter much) --- Separate tag or collectible objects ---
- all collected objects
The count of all tagged objects? how would you do this?
- all objects collected by erikhatcher
facet.query=C_user:erikhatcher
- all collected objects with tag "foo"
facet.query=C_tag:foo - facet by tag facet.field=C_tag (this would give counts of *tags* not documents) - filter query resultst by a constraint tag=foo Not currently doable, would need to build up a filter somehow... indirectFilter=id:((C_tag:foo).C_uid) If an indirect approach has enough advantages, we could perhaps come up with a way to express it.
My custom facet cache differs from the built-in facets in that it builds a cross-reference cache from the "C" types to the "A" types (a JOIN, heh).
What does the cross-reference cache look like when it's built? A simple int[]? To do more efficiently, it seems like one would want separate indicies for the A and C docs to keep maxDoc() down. What's the id for the C docs? user catenated with id of the collected doc, so all tags/comments for a particular user on a particular doc go in the same C doc? -Yonik On 2/2/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
Before Solr had facets, I built my own implementation in a much cruder and less performant way into Collex as custom request handlers. Now the performance issue of warming up the cache needs to be addressed. I'm going to upgrade Solr and adjust the application to work with the built-in faceting and see how far I get with that. The dilemma is that I've got a couple of custom things that don't map to the built-in faceting and I'm looking for advice on how to proceed. The index has a "type" field: "A" for archived objects and "C" for collectibles. All the original objects are indexed in batch fashion as type "A". Users collect objects and tags/annotates them. When a user collects an object, a document of type "C" is indexed with the original objects unique identifier (a URI), the username, tags, and annotation. My custom facet cache differs from the built-in facets in that it builds a cross-reference cache from the "C" types to the "A" types (a JOIN, heh). We can do queries that return facet counts such as: - all collected objects - all objects collected by erikhatcher - all collected objects with tag "foo" One of the facet counts returned is user, so you can easily see how many objects each user has collected. For the basic faceting we do on object metadata, this will fit well with what Solr has built-in, but I'm not quite sure how to build in the cross-reference and leverage faster warming, so I'm asking here to see what thoughts folks have on how to proceed.