Field Collapse question
Is there a way to configure the Field Collapse functionality to not collapse Null fields? I want to collapse on a field that a certain percentage of documents in my index have...but not all of them. If they don't have the field I want it to be treated uncollapsed. Is there a setting to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Field-Collapse-question-tp939118p939118.html Sent from the Solr - User mailing list archive at Nabble.com.
Query modification
If I wanted to intercept a query and turn q=romantic italian restaurant in seattle into q=romantic tag:restaurant city:seattle cuisine:italian would I subclass QueryComponent, modify the query, and pass it to super? Or is there a standard way already to do this? What about changing it to q=romantic city:seattle cuisine:italian&fq=type:restaurant would that be the same process, or is there a nuance to modifying a query into a query+filterQuery? Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Query-modification-tp939584p939584.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field Collapse question
I wanted to extend my question some. My original question about collapsing null fields is still open, but in trying to research elsewhere I see a lot of angst about the Field Collapse functionality in general. Can anyone summarize what the current state of affairs is with it? I'm on Solr 1.4, just the latest release build, not any current builds. Field Collapse seems to be in my build because I could do single field collapse just fine (hence my null field question). However there seems to be talk of problems with Field Collapse that aren't fixed yet. What kinds of issues are people having? Should I avoid Field Collapse in a production app for now? (tricky because I'm merging my schema with a third party tool schema and they are using Field Collapse). Any insight would be helpful, thanks Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Field-Collapse-question-tp939118p940923.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query modification
So QueryComponent is the place to do this? Are query analyzers already done? Would I have access to stems, synonyms, tokens, etc of the query? -- View this message in context: http://lucene.472066.n3.nabble.com/Query-modification-tp939584p940941.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: document level security: indexing/searching techniques
Someone else was recently asking a similar question (or maybe it was you but worded differently :) ). Putting user level security at a document level seems like a recipe for pain. Solr/Lucene don't do frequent update well...and being highly optimized for query, I don't blame them. Is there any way to create a series of roles that you can apply to your documents? If the security level of the document isn't changing, just the user access to them, give the docs a role in the index, put your user/usergroup stuff in a DB or some other system and resolve your user into valid roles, then FilterQuery on role. -- View this message in context: http://lucene.472066.n3.nabble.com/document-level-security-indexing-searching-techniques-tp946528p946649.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: index format error because disk full
I haven't used this myself, but Solr supports a http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 rollback function. It is supposed to rollback to the state at the previous commit. So you may want to turn off auto-commit on the index you are updating if you want to control what that last commit level is. However, in your case if the index gets corrupted due to a disk full situation, I don't know what rollback would do, if anything, to help. You may need to play with the scenario to see what would happen. If you are using the DataImportHandler it may handle the rollback for you...again, however, it may not deal with disk full situations gracefully either. -- View this message in context: http://lucene.472066.n3.nabble.com/index-format-error-because-disk-full-tp948249p948968.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do I get the matched terms of my query?
if you want only documents that have both values then make your q q=content:videos+AND+content:songs If you want the more open query, but to be able to tell which docs have videos, which have songs and which have both...then I'm not sure. Using debugQuery=on might help with your understanding, but isn't a good runtime solution if you needed that. -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-get-the-matched-terms-of-my-query-tp951422p951492.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Score boosting
Sounds like you want Payloads. I don't think you can guarantee a position, but you can boost relative to others. You can give one author/book a boost of 0 for the phrase Cooking, and another author/book a boost of .5 and yet another a boost of 1.0. For searches that include the phrase Cooking, the scores should reflect the boosts and the authors that bought the higher boost value will sort higher. These discuss Payloads (it isn't a trivial task by the way): http://www.ultramagnus.org/?p=1 http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ or use this to see other Solr-User group discussions on the topic: http://lucene.472066.n3.nabble.com/template/NodeServlet.jtp?tpl=search-page&node=472068&query=Using+Lucene's+payload+in+Solr -- View this message in context: http://lucene.472066.n3.nabble.com/Score-boosting-tp951214p951510.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Database connections during data import
Gora, Our environment, currently under development, is very nearly the exact same thing as yours. My DB is currently only about 10GB, but likely to grow. We also use Solr as primary repository (store all fields there), but use the DB as a back up when Full Import is needed. Delta imports aren't that bad, except when one of our larger data feeds comes in once a month. That is a very large delta import and hits some of the same issues as a full import. I'm still trying out different architectures to deal with this. I've tried doing a Bulk Copy from the DB to some flat files and importing from there. File handles seem to be more stable than database connections. But it brings it's own issues to the party. I'm also currently looking at using queuing (either MSMQ or Amazons Simple Queue service) so the database piece isn't used for 20 hours, but gets it's part over fairly quickly. I haven't done this using DataImportHandler however, not sure yet how, so I'm writing my own Import manager. I know this isn't a solve, but maybe some other ideas you can consider. As to the GData handler and response writer. I would be very interested in OData versions, which wouldn't be too much of a stretch from GData to deal with. Would you be moving in that direction later? Or if you put your contrib out there could someone else (maybe me if time allows) be able to take it there? That would be a great edition for our work in a few months. Good luck, and I'd love to keep in touch about your solutions, I'm sure I could get some great ideas from them for our own work. Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Database-connections-during-data-import-tp956325p958071.html Sent from the Solr - User mailing list archive at Nabble.com.