Field Collapse question

2010-07-02 Thread osocurious2

Is there a way to configure the Field Collapse functionality to not collapse
Null fields? I want to collapse on a field that a certain percentage of
documents in my index have...but not all of them. If they don't have the
field I want it to be treated uncollapsed.  Is there a setting to do this?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-Collapse-question-tp939118p939118.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query modification

2010-07-02 Thread osocurious2

If I wanted to intercept a query and turn
q=romantic italian restaurant in seattle
into
q=romantic tag:restaurant city:seattle cuisine:italian

would I subclass QueryComponent, modify the query, and pass it to super? Or
is there a standard way already to do this?

What about changing it to
   q=romantic city:seattle cuisine:italian&fq=type:restaurant

would that be the same process, or is there a nuance to modifying a query
into a query+filterQuery?

Ken

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-modification-tp939584p939584.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field Collapse question

2010-07-03 Thread osocurious2



I wanted to extend my question some. My original question about collapsing
null fields is still open, but in trying to research elsewhere I see a lot
of angst about the Field Collapse functionality in general. Can anyone
summarize what the current state of affairs is with it? I'm on Solr 1.4,
just the latest release build, not any current builds. Field Collapse seems
to be in my build because I could do single field collapse just fine (hence
my null field question). However there seems to be talk of problems with
Field Collapse that aren't fixed yet. What kinds of issues are people
having? Should I avoid Field Collapse in a production app for now? (tricky
because I'm merging my schema with a third party tool schema and they are
using Field Collapse).

Any insight would be helpful, thanks
Ken
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-Collapse-question-tp939118p940923.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query modification

2010-07-03 Thread osocurious2

So QueryComponent is the place to do this? Are query analyzers already done?
Would I have access to stems, synonyms, tokens, etc of the query?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-modification-tp939584p940941.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: document level security: indexing/searching techniques

2010-07-06 Thread osocurious2

Someone else was recently asking a similar question (or maybe it was you but
worded differently :) ).

Putting user level security at a document level seems like a recipe for
pain. Solr/Lucene don't do frequent update well...and being highly optimized
for query, I don't blame them. Is there any way to create a series of roles
that you can apply to your documents? If the security level of the document
isn't changing, just the user access to them, give the docs a role in the
index, put your user/usergroup stuff in a DB or some other system and
resolve your user into valid roles, then FilterQuery on role.  
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/document-level-security-indexing-searching-techniques-tp946528p946649.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: index format error because disk full

2010-07-07 Thread osocurious2

I haven't used this myself, but Solr supports a 
http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 rollback 
function. It is supposed to rollback to the state at the previous commit. So
you may want to turn off auto-commit on the index you are updating if you
want to control what that last commit level is.

However, in your case if the index gets corrupted due to a disk full
situation, I don't know what rollback would do, if anything, to help. You
may need to play with the scenario to see what would happen.

If you are using the DataImportHandler it may handle the rollback for
you...again, however, it may not deal with disk full situations gracefully
either.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/index-format-error-because-disk-full-tp948249p948968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do I get the matched terms of my query?

2010-07-08 Thread osocurious2

if you want only documents that have both values then make your q
   q=content:videos+AND+content:songs

If you want the more open query, but to be able to tell which docs have
videos, which have songs and which have both...then I'm not sure. Using
debugQuery=on might help with your understanding,  but isn't a good runtime
solution if you needed that.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-get-the-matched-terms-of-my-query-tp951422p951492.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Score boosting

2010-07-08 Thread osocurious2

Sounds like you want Payloads. I don't think you can guarantee a position,
but you can boost relative to others. You can give one author/book a boost
of 0 for the phrase Cooking, and another author/book a boost of .5 and yet
another a boost of 1.0. For searches that include the phrase Cooking, the
scores should reflect the boosts and the authors that bought the higher
boost value will sort higher. These discuss Payloads (it isn't a trivial
task by the way):
  http://www.ultramagnus.org/?p=1
 
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
or use this to see other Solr-User group discussions on the topic:

http://lucene.472066.n3.nabble.com/template/NodeServlet.jtp?tpl=search-page&node=472068&query=Using+Lucene's+payload+in+Solr

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Score-boosting-tp951214p951510.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Database connections during data import

2010-07-11 Thread osocurious2

Gora,
Our environment, currently under development, is very nearly the exact same
thing as yours. My DB is currently only about 10GB, but likely to grow. We
also use Solr as primary repository (store all fields there), but use the DB
as a back up when Full Import is needed. Delta imports aren't that bad,
except when one of our larger data feeds comes in once a month. That is a
very large delta import and  hits some of the same issues as a full import.

I'm still trying out different architectures to deal with this. I've tried
doing a Bulk Copy from the DB to some flat files and importing from there.
File handles seem to be more stable than database connections. But it brings
it's own issues to the party. I'm also currently looking at using queuing
(either MSMQ or Amazons Simple Queue service) so the database piece isn't
used for 20 hours, but gets it's part over fairly quickly. I haven't done
this using DataImportHandler however, not sure yet how, so I'm writing my
own Import manager.

I know this isn't a solve, but maybe some other ideas you can consider.

As to the GData handler and response writer. I would be very interested in
OData versions, which wouldn't be too much of a stretch from GData to deal
with. Would you be moving in that direction later? Or if you put your
contrib out there could someone else (maybe me if time allows) be able to
take it there? That would be a great edition for our work in a few months.

Good luck, and I'd love to keep in touch about your solutions, I'm sure I
could get some great ideas from them for our own work.
Ken
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Database-connections-during-data-import-tp956325p958071.html
Sent from the Solr - User mailing list archive at Nabble.com.