from:"Henrib"

Filtering through list of document unique keys & keeping that order

2008-07-31 Thread Henrib


I'm re-adapting some pretty-old/hacked (1.2dev) code that performs a query
filtered by a list of document unique keys and returning results based on
the list order. Anyone having same requirement/feature/code ?

I've been looking in QueryComponent where there is code to handle shards
that performs at least the first half (aka filter by the list of ids) and
wondering if adapting that code would be a reasonable way to get a "clean"
version of the feature.

Is this feature generic enough to submit a QueryComponent patch or should I
keep this separate ?

The list of unique keys is obtained from a concatenated list of ids; in my
case, I already have a 'String[]' enumerating these unique keys. The request
context seems the proper way to "inject" that list as is but would
abstracting/completing the 'params.get(ShardParams.IDS)' with a
'getContext().get("org.apache.solr.common.params.ShardParams.ids") - where
the latter would be the String[]' version - an acceptable generic solution
to pass arbitrary params ?

Any hints at how the 'sort' specification {c,sh}ould be described or should
that be implied somehow through the context parameters?

Any advice welcome.
Thanks.

PS: As a side-note, I'm not sure I understand correctly but it seems
DocSlice is performing in N2 complexity (the boolean exists() {} is going to
be called for each doc and it is scanning the list for each one...).
-- 
View this message in context: 
http://www.nabble.com/Filtering-through-list-of-document-unique-keys---keeping-that-order-tp18751975p18751975.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom information in schema.xml

2008-07-31 Thread Henrib

We could harness solr-646, reusing the ..
syntax by creating a scope for field elements when reading the schema. Since
properties (the PropertyMap) are stored in the ResourceLoader, it seems we
should be able to access them in the useful places through the usual
suspects(the core, the config, the schema).
In the current case, the property would be defined as
'solr.schema.field.state.displaylevel' - generalized as
'solr.schema.field..'.
The only extension to the schema would be by allowing 'property' in the
useful places. 
The syntax & usage would not be as clean as adding an attribute but leaves
more freedom to extensions in more places (aka can generalize to fied types,
etc).
I believe I can update the patch quickly if you're interested.
Henri

Grant Ingersoll-6 wrote:
> 
> Not at the moment, but this is something I've wanted to do, too.  See
> http://lucene.markmail.org/message/ed5g5qdfdmgaefrn?q=attributes+schema
> 
> -Grant
> 
> On Jul 31, 2008, at 6:36 AM, Nikhil Chhaochharia wrote:
> 
>> Hi,
>>
>> I am using SolrJ and the latest build of Solr 1.3
>> I want to add some custom information to schema.xml and then access  
>> it from within my application.
>>
>> Ideally, I would like to enhance the field definitions in schema.xml  
>> to something like
>>
>>< field name="state" type="text" displaylevel="private" />
>>
>> and then be able to access the displaylevel value from my application.
>>
>> Is there any way of doing this?
>>
>> Thanks,
>> Nikhil
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Custom-information-in-schema.xml-tp18751841p18757614.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Loading solr from .properties file

2008-08-06 Thread Henrib


This should be one use-case for 
https://issues.apache.org/jira/browse/SOLR-646 SOLR-646 . 
If you can try it, don't hesitate to report/comment on the issue.
Henri


zayhen wrote:
> 
> Hello guys,
> 
> I have to load solr/home from a .properties file, because of some
> environment standards I have to follow for one client that insists I
> should
> deliver Solr in one .ear containing its .war
> 
> The thing is: the same .ear in testing must the .ear in production, so I
> can't change the env-entry in web.xml
> 
> The other problem, the sysadmin won't let me alter JVM parameters.
> 
> I would like to rear the experts opinion.
> 
> -- 
> Alexander Ramos Jardim
> 
> 
> -
> RPG da Ilha 
> 

-- 
View this message in context: 
http://www.nabble.com/Loading-solr-from-.properties-file-tp18851924p18857371.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Loading solr from .properties file

2008-08-06 Thread Henrib

My bad, sorry, I read too fast and skipped the fist necessary which is to get
solr.home through JNDI which you can't have nor set I presume... (which from
there would allow reading a multicore.properties, etc...)
I can only hope you are not running on Websphere (where the same kind of
sysadmin go ballistic when they see logs in their console...)
Cheers
Henrib

zayhen wrote:
> 
> Hello Henrib,
> 
> I have read the issue and it seems an interesting feature for Solr, but I
> don't see how it address to my needs, as I need to point Solr to the
> multicore.properties file.
> 
> Actually, I have already resolved my problem, hacking SolrResourceLoader
> so
> that it can load a solr.properties file  inside a directory defined in a
> standard JVM argument the sysadmin puts on all JVM's (I know this is
> stupid.
> He could put the solr.home, but NO, he said! Use the standards! F0cking
> sysadmin...)
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Loading-solr-from-.properties-file-tp18851924p18860185.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: includes in solrconfig.xml

2008-08-10 Thread Henrib

Not sure if this is what you seek but solr-646 adds the  allowing to import a resource and ..
to insert 'chunks' that have no natural root node.
These do work for solrconfig.xml & schema.xml.
Cheers
Henri

Jacob Singh-2 wrote:
> 
> Hello,
> 
> Is it possible to include an external xml file from within solrconfig.xml?
> 
> Or even better, to scan a directory ala conf.d in apache?
> 
> Thanks,
> jacob
> 
> 

-- 
View this message in context: 
http://www.nabble.com/includes-in-solrconfig.xml-tp18901792p18915473.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: XML includes in solrconfig.xml/schema.xml

2008-08-21 Thread Henrib


The other option is to use solr-646 which adds the ability to include files
through an .
Regards
henri

-- 
View this message in context: 
http://www.nabble.com/XML-includes-in-solrconfig.xml-schema.xml-tp19096292p19097243.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: XML includes in solrconfig.xml/schema.xml

2008-08-21 Thread Henrib

Since I authored the patch, I'm guilty on all counts. :-)

Amit Nithian wrote:
> 
> I am not sure why they chose that direction over built-in entity include.
> 
Entities are not the most used or known feature and I just did not think of
this was a way to do it.
I also wanted variable expansion in the 'resource name' which might not be
easy to obtain through entities.

Amit Nithian wrote:
> 
> The patch page says that the include resource is an experimental feature
> 

It is a debated feature; acknowledging the use-case & need, the amount of
code dedicated to configuration (versus pure search features), the urge to
get release 1.3 out... I just wanted to make clear this was in flux.
Cheers
Henri

-- 
View this message in context: 
http://www.nabble.com/XML-includes-in-solrconfig.xml-schema.xml-tp19096292p19097895.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: CoreDescriptor explanation and possible bug

2008-08-28 Thread Henrib


Hi,
It is likely to be related to how you initialize Solr & create your
SolrCore; there has been a few changes to ensure there is always a
CoreContainer created which (as its name stands), holds a reference to all
created SolrCore.
There is a CoreContainer.Initializer class that allows to easily create that
CoreContainer (even if you don't have a solr.xml).
Hope this helps,
Henri



Nikhil Chhaochharia wrote:
> 
> Hi,
> 
> I have been using nightly builds of Solr 1.3 with SolrJ for some time now. 
> I upgraded the Solr jars from the 14-Aug nightly and my program would not
> compile.  I found that SolrCore constructor needed a new parameter
> CoreDescriptor.  I passed null and it worked fine.
> 
> I then upgraded to the Solr jars from the 20th-Aug nightly and it started
> throwing NullPointerExceptions.  My guess is that the addition of the
> method SolrCore.getStatistics() which has the line "lst.add("aliases",
> getCoreDescriptor().getCoreContainer().getCoreNames(this));" throws the
> NPE.
> 
> I searched the wiki and there is no mention of CoreDescriptor there. 
> Could somebody please explain what CoreDescriptor is all about and how it
> is supposed to be used?
> 
> Thanks,
> Nikhil
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/CoreDescriptor-explanation-and-possible-bug-tp19197004p19198965.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: CoreDescriptor explanation and possible bug

2008-08-28 Thread Henrib


Seems you want something like:

  public SolrCore nikhilInit(final IndexSchema indexSchema) {
final String solrConfigFilename = "solrconfig.xml"; // or else
CoreContainer.Initializer init = new CoreContainer.Initializer() {
  @Override
  public CoreContainer initialize() {
  CoreContainer container = new CoreContainer(new
SolrResourceLoader(SolrResourceLoader.locateInstanceDir()));
SolrConfig solrConfig = solrConfigFilename == null ? new
SolrConfig() : new SolrConfig(solrConfigFilename);
CoreDescriptor dcore = new CoreDescriptor("",
solrConfig.getResourceLoader().getInstanceDir());
//dcore.setCoreContainer(container);
dcore.setConfigName(solrConfig.getResourceName());
dcore.setSchemaName(indexSchema.getResourceName());
SolrCore core = new SolrCore( "", null, cfg, indexSchema, dcore);
container.register("", core, false);
return container;
}
};
return init.initialize().getCore("");
  }


-- 
View this message in context: 
http://www.nabble.com/CoreDescriptor-explanation-and-possible-bug-tp19197004p19200585.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: CoreDescriptor explanation and possible bug

2008-08-28 Thread Henrib

Nikhil Chhaochharia wrote:
> 
> 
> I am assuming that these are part of some patch which will get applied
> before 1.3 releases, is that correct ?
> 
> Nikhil
> 
> 

Yes, this is part of a patch and no, they most likely will not make it in
1.3.
However, I guess the following will bring you even closer:

  public SolrCore nikhilInit(final IndexSchema indexSchema) {
final String solrConfigFilename = "solrconfig.xml"; // or else
CoreContainer.Initializer init = new CoreContainer.Initializer() {
  @Override
  public CoreContainer initialize() {
  CoreContainer container = new CoreContainer(new
SolrResourceLoader(SolrResourceLoader.locateInstanceDir()));
SolrConfig solrConfig = solrConfigFilename == null ? new
SolrConfig() : new SolrConfig(solrConfigFilename);
CoreDescriptor dcore = new CoreDescriptor(container, "",
solrConfig.getResourceLoader().getInstanceDir());
dcore.setConfigName(solrConfig.getResourceName());
dcore.setSchemaName(indexSchema.getResourceName());
SolrCore core = new SolrCore( "", null, solrConfig, indexSchema,
dcore);
container.register("", core, false);
return container;
}
};
return init.initialize().getCore("");
  }

-- 
View this message in context: 
http://www.nabble.com/CoreDescriptor-explanation-and-possible-bug-tp19197004p19201459.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sending queries to multicore installation

2008-09-15 Thread Henrib


Hi,
If you are sure that you did index your documents through the intended core,
it might be that your solrconfig.xml does not use the 'dataDir' property you
declared in solr.xml for your 2 cores.

The shopping & tourims solconfig.xml should have a line stating:
${dataDir}

And *not* the default:
${solr.data.dir:./solr/data}
Which will make both cores use the same index.

Hope this helps,
Henrib


rogerio.araujo wrote:
> 
> Hi!
> 
> I have a multicore installation with the following configuration:
> 
> 
>   
> 
> 
> 
> 
> 
> 
>   
> 
> 
> Each core uses different schemas, I indexed some docs shopping core and a
> few others on tourism core, when I send a query "a*" to tourism core I'm
> getting docs from shopping core, this is the expected behaviour? Should I
> define a "core" field on both schemas and use this field as filter, like
> we
> have
> here<http://wiki.apache.org/solr/MultipleIndexes#head-9e6bee989c8120974eee9df0944b58a28d489ba2>,
> to avoid it?
> 
> -- 
> Regards,
> 
> Rogério (_rogerio_)
> 
> [Blog: http://faces.eti.br] [Sandbox: http://bmobile.dyndns.org] [Twitter:
> http://twitter.com/ararog]
> 
> "Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento,
> distribua e aprenda mais."
> (http://faces.eti.br/2006/10/30/conhecimento-e-amadurecimento)
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Sending-queries-to-multicore-installation-tp19486412p19489077.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Some new SOLR features

2008-09-16 Thread Henrib

ryantxu wrote:
> 
> ...
> Yes, I would like to see a way to specify all the fieldtypes /   
> handlers in one location and then only specify what fields are   
> available for each core. 
> 
> So yes -- I agree.  In 2.0, I hope to flush out configs so they are   
> not monstrous. 
> ...
> 

What about using "include" so each core can have a minimal specific
configuration and schema & everything else shared between them?
Something akin to what's allowed by solr-646.
Just couldn't resist :-)
Henri

-- 
View this message in context: 
http://www.nabble.com/Some-new-SOLR-features-tp19494251p19515526.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Some new SOLR features

2008-09-16 Thread Henrib

ryantxu wrote:
> 
> 
> Yes, include would get us some of the way there, but not far enough  
> (IMHO).  The problem is that (as written) you still need to have all  
> the configs spattered about various directories.
> 
> 

I does not allow us to go *all* the way but it does allow to put
configurations files in one directory (plus schema & conf can have specific
names set for each CoreDescriptor).
There actually is a test where the config & schema are shared & can set the
dataDir as a property.
Still a step forward...

-- 
View this message in context: 
http://www.nabble.com/Some-new-SOLR-features-tp19494251p19516242.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Some new SOLR features

2008-09-17 Thread Henrib

Yonik Seeley wrote:
> 
> ...multi-core allows you to instantiate a completely
> new core and swap it for the old one, but it's a bit of a heavyweight
> approach
> ...a schema object would not be mutable, but
> that one could easily swap in a new schema object for an index at any
> time...
> 

Not sure I understand what we gain; if you change the schema, you'll most
likely will
have to reindex as well. Or are you saying we should have a shortcut for the
whole operation of
"creating a new core, reindex content, replacing an existing core" ?

Yonik Seeley wrote:
> 
> ...completely separate the serialized and in memory representations...
> 

Let's say hypothetically that a stubborn contributor issues a patch along
those lines: {Solr,SolrConfig,SolrSchema}Descriptor classes that capture
what is currently defined through XML in a Java API, parsers so the current
XML could still be used to instantiate those classes. Would that be
considered an acceptable first step? 

-- 
View this message in context: 
http://www.nabble.com/Some-new-SOLR-features-tp19494251p19540853.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Switching cores dynamically

2010-03-19 Thread Henrib


Hi,
You could (theoretically) reduce the down-time to zero using a 'swap'
command:
http://wiki.apache.org/solr/CoreAdmin?highlight=%28swap%29#SWAP
Cheers
Henrib


muneeb wrote:
> 
> Hi,
> 
> I have indexed almost 7 million articles on two separate cores, each with
> their own conf/ and data/ folder, i.e. they have their individual index.
> 
> What I normally do is, use core0 for querying and core1 for any updates
> and once updates are finished i copy the index of core1 to core0's data
> folder. I know this isn't an efficient way of doing this, since this
> brings a downtime on my search service for a couple of minutes. 
> 
> I was wondering if its possible to switch between cores dynamically
> (keeping my current setup in mind) in such a way that there is no downtime
> at all during switching.
> 
> Thanks very much in advance.
> -M  
> 

-- 
View this message in context: 
http://old.nabble.com/Switching-cores-dynamically-tp27950928p27950994.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple solr home directories

2007-09-03 Thread Henrib


Another possible (and convoluted) way is to use SOLR-215 patch which allows
multiple indexes within one Solr instance (also at this stage, you'd loose
replication and would probably have to adapt the servlet filter).
Regards
Henri


Yu-Hui Jin wrote:
> 
> Hi, there,
> 
> I have a few basic questions on setting up Solr home directories.
> 
> * can we set up multiple Solr home directories within the same Solr
> instance?  (I want to use the same Tomcat Solr instance to support
> indexing
> and searching over multiple independent indexes.)
> 
> * If so, say I have some customized Solr plugins, ie., jar files, do I
> have
> to add them to each Solr home's lib directory? ( It feels it's a bit
> redundant to add them multiple times for the same Solr instance. )
> 
> 
> Thanks,
> 
> -Hui
> 
> 

-- 
View this message in context: 
http://www.nabble.com/multiple-solr-home-directories-tf4346057.html#a12459726
Sent from the Solr - User mailing list archive at Nabble.com.

query handling / multiple languages / multiple cores

2007-10-18 Thread Henrib


We have an application where we index documents that can exist in many (at
least 2) languages.
We have 1 SolrCore per language using the same field names in their schemas
(different stopwords , synonyms & stemmers), the benefits for content
maintenance overweighting (at least) complexity.
Using EN & FR as an example, a document always exist in EN as a reference
and some of them - not all - are translated in FR; the same document unique
id is used for the reference & the translation.
If a user performs a query in FR, FR documents and EN documents are
searched.
FR docs are seeked first; the same query is also run against EN removing
from the document set those returned by the FR query. That is, if document
id 'AZ123' is retrieved through the FR query, it can't be retrieved by the
EN query. Removing the FR returned documents ids from the EN searchable
document set guarantees that the 2 results sets are disjoint.

1/ Anyone with the same kind of functional requirements? Is using multiple
cores a bad idea for this need ?

On the practical side, this lead me to a handler that needs to restrict the
document set through an externally defined list of Solr unique ids (we also
need to deal with some upfront ACL management to top it all).
However, I'm missing a small method that would nicely complete the
SolrIndexSearcher.getListDoc*.

  public DocList getDocList(Query query, DocSet filter, Sort lsort, int
offset, int len, int flags) throws IOException {
DocListAndSet answer = new DocListAndSet();
getDocListC(answer,query,null,filter,lsort,offset,len,flags);
return answer.docList;
  }

I intend to use this after I intersect potential filter queries & the
restricted document set in the request handler; the Query filter version of
the method is exposed, this would be the DocSet version of it.
2/ Any reason not to do this? {Sh,C}ould this method be included -or should
I create an enhancement request ?

My current idea to create the DocSet from the document ids is the following:

DocSet keyFilter(org.apache.lucene.index.IndexReader reader,
String keyField,
java.util.Iterator ikeys) throws java.io.IOException {
org.apache.solr.util.OpenBitSet bits = new
org.apache.solr.util.OpenBitSet(reader.maxDoc());
if (ikeys.hasNext()) {
org.apache.lucene.index.Term term = new
org.apache.lucene.index.Term(keyField,ikeys.next());
org.apache.lucene.index.TermDocs termDocs =
reader.termDocs(term);
try {
  if (termDocs.next())
  bits.fastSet(termDocs.doc());
  while(ikeys.hasNext()) {
  termDocs.seek(term.createTerm(ikeys.next()));
  if(termDocs.next())
  bits.fastSet(termDocs.doc());
   }
}
finally {
  termDocs.close();
}
}
return new org.apache.solr.search.BitDocSet(bits);
}

3/ Any better/faster way to create a DocSet from a list of unique ids?

Comments & questions welcome.
Thanks 
-- 
View this message in context: 
http://www.nabble.com/query-handling---multiple-languages---multiple-cores-tf4646262.html#a13272241
Sent from the Solr - User mailing list archive at Nabble.com.

Filtering by document unique key

2007-10-19 Thread Henrib



I'm trying to filter my document collection by an external "mean" that
produces a set of document unique keys.
Assuming this goes into a custom request handler (solr-281 making that
easy), any pitfall using a ConstantScoreQuery (or an equivalent filtering
functionality) as a Solr "filter query" ?
Thanks
-- 
View this message in context: 
http://www.nabble.com/Filtering-by-document-unique-key-tf4654343.html#a13298066
Sent from the Solr - User mailing list archive at Nabble.com.

Re: GET_SCORES flag in SolrIndexSearcher

2007-10-19 Thread Henrib


I believe that keeping you code as is but initializing the query parameters
should do the trick:

HashMap params = new HashMap();
params.add("fl", "id score"); // field list is id & score
...
Regards


John Reuning-2 wrote:
> 
> My first pass was to implement the embedded solr example:
> 
> --
> MultiCore mc = MultiCore.getRegistry();
> SolrCore core = mc.getCore(mIndexName);
> 
> SolrRequestHandler handler = core.getRequestHandler("");
> HashMap params = new HashMap();
> 
> SolrQueryRequest request = new LocalSolrQueryRequest(core, query, 
> "standard", 0, 100, params);
> SolrQueryResponse response = new SolrQueryResponse();
> core.execute(handler, request, response);
> 
> DocList docs = (DocList) response.getValues().get("response");
> --
> 
> Is the only way to access scores to call directly to SolrIndexSearcher? 
>   I was wondering if there's a solr config option I'm missing somewhere 
> that tells the SolrIndexSearcher to retain lucene scores.  I'll keep 
> digging.  Maybe there's a way to set a LocalSolrQueryRequest param that 
> passes the right info through to SolrIndexSearcher?
> 
> Thanks,
> 
> -jrr
> 
> Chris Hostetter wrote:
>> : The scores list in DocIterator is null after a successful query.
>> There's a
>> : flag in SolrIndexSearcher, GET_SCORES, that looks like it should
>> trigger
>> : setting the scores array for the resulting DocList, but I can't figure
>> out how
>> : to set it.  Any suggestions?  I'm using the svn trunk code.
>> 
>> Can you elaborate (ie: paste some code examples) on how you are aquiring 
>> your DocList ... what method are you calling on SolrIndexSearcher? what 
>> arguments are you passing it?
>> 
>> NOTE: the SolrIndexSearcher.getDocList* methods may choose to build 
>> the DocList from a DocSet unless:
>> a) you use a sort that inlcudes score
>> or  b) you use a method sig that takes a flags arg and explicitly set 
>>the GET_SCORES mask on your flags arg.
>> 
>> 
>> 
>> 
>> -Hoss
>> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/GET_SCORES-flag-in-SolrIndexSearcher-tf4641637.html#a13297657
Sent from the Solr - User mailing list archive at Nabble.com.

logging through log4j

2008-04-22 Thread Henrib


Hi,
I'm (still) seeking more advice on this deployment issue which is to use
org.apache.log4j instead of java.util.logging. I'm not seeking re-starting
any discussion on solr4j/commons/log4j/jul respective benefits; I'm seeking
a way to bridge jul to log4j with the minimum specific per-container
configuration or restriction.
I've failed to find a way that would work for "all" servlet containers
(Tomcat,WebSphere,Jetty) without disrupting SolrCode.
My last current attempt that requires code modification is posted in last
reply here
http://www.nabble.com/logging-through-log4j-to13747253.html#a16825364.
Comments/experience welcome.
Thanks
Henri
-- 
View this message in context: 
http://www.nabble.com/logging-through-log4j-tp16825424p16825424.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: logging through log4j

2008-04-24 Thread Henrib

Will,
I'd be definitely interested in your code but mostly in the config &
deployment options if you can share.

You did not happen to deploy on Websphere 6 by any chance ? I can't find a
way to configure jul to only log into our application logs (even less so in
our log4j logs); I'm not even sure it is feasible since IBMs documentation
state that only "anonymous" loggers can escape the common sink. And since
commons logging is in our bundled war for some other library, things get a
tad confusing on where/what the actual configuration should be.

I realize this is not the first thread on the logging topic but we've not
been able (yet) to gather some experience or collect documented way of doing
this for each container (Jetty/Tomcat5/Tomcat6/WebLogic/webSphere...).
This is why I tend to prefer the container agnostic way of configuring
logging (and else), the application/war configured way. This lead me to  
https://issues.apache.org/jira/browse/SOLR-549 SOLR-549  which is
cross-container but (alas) requires the code to change and introduces (yet
another) logging configuration convention.

Henri

Will Johnson-2 wrote:
> 
> Henri, 
> 
> There are some bridges out there but none had a version number > 0.1.  I
> found the simplest way was to configure JUL using a custom config file and
> then tell it to use my custom handler to forward all messages to log4j.
> There are obvious performance implications but it is doable and fairly
> simple since it didn't require any solr code changes.
> 
> - will
> 

-- 
View this message in context: 
http://www.nabble.com/logging-through-log4j-tp16825424p16847405.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: [poll] Change logging to SLF4J?

2008-05-22 Thread Henrib



Ryan McKinley wrote:
> 
>> [  ] Keep solr logging as it is.  (JDK Logging)
>> [X  ] Use SLF4J.
> 
Can't "keep as is" since this strictly precludes configuring logging in a
container agnostic way.
-- 
View this message in context: 
http://www.nabble.com/-poll--Change-logging-to-SLF4J--tp17084684p17405410.html
Sent from the Solr - User mailing list archive at Nabble.com.

Embedding Solr vs Lucene, multiple Solr cores?

2007-04-13 Thread Henrib


I'm trying to choose between embedding Lucene versus embedding Solr in one
webapp.

In Solr terms, functional requirements would more or less lead to multiple
schema & conf (need CRUD/generation on those) and deployment constraints
imply one webapp instance. The choice I'm trying to make is thus:
-Embed Lucene and (attempt to) recode a lot of what Solr provides... (the
straw man)
-Embed Solr but refactor 'some' of its core, assuming it is correct to see
one Solr core as the association of one schema & one conf.

There have been a few threads about multiple indexes and/or
multiple/reloading schemas.
>From what I gathered, one solution stems from the 'multiple webapp instances
deployment' and implies 'extracting' the static instance (at least the
SolrCore) & thus host multiple Solr cores in one webapp.

Obviously, the operations (queries/add/delete doc) would need to carry which
core they are targeting (one 'core' being set as the 'default' for
compatibility purpose).
What will be the other big hurdles, the ones that could even preclude the
very idea ? (caches handling, updater threads, HA features...)
Are there any easier routes (class-loaders, 'provisional' schema...) ?

Any advice welcome. Thanks.
Henri


-- 
View this message in context: 
http://www.nabble.com/Embedding-Solr-vs-Lucene%2C-multiple-Solr-cores--tf3572324.html#a9981355
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Embedding Solr vs Lucene, multiple Solr cores?

2007-04-13 Thread Henrib

Thank you both for your quick answers.

The one webapp constraint comes from the main 'embedding' application so I
don't have much leeway there. The direct approach was to map the
main/hosting application document collection & types to one schema/conf.
Since the host collections & types can be dynamically created, this seemed
the natural route (albeit hard).

The longer story is that in our typical customer environments, IT deploys &
monitors webapps (provision space & al, replicate for disaster recovery etc)
but does not want to deal with the application itself, leaving the 'business
users' side administer it. Even if there is a dedicated Tomcat for the main
app, IT will not let the 'business users' install other applications (scope
of responsibility, code versus data, validation procedures, etc). Thus the
'one application' constraint.

Anyway, it seems a 'provisional' schema where most fields would be dynamic
and some notational convention to map them would be the easiest route. And
replace the targeted different indexes by equivalent filters. I gather from
your inputs the potential functionality loss and/or performance hit is not
something I should be afraid of.

For the sake of completeness, instead of embedding Solr in that single
instance, I thought about using several Solr instances running in different
webapp instances & use them as 'coprocessors' for the main application; this
would imply serializing/deserializing/redirecting queries & results between
webapps which is not the most efficient way on a single host/VM env (may be
Tomcat crosscontext could help alleviate that). But this would also require
dynamically deploying webapps for that purpose which is a no-no from IT...

For the sake of argument :-), besides the SolrCore singletons which is easy
to circumvent (a map of cores & at least a pointer from the instantiated
schema to the core handling it, are there others that are hiding
(Config.config, caches...) that would preclude the multiple core track?

Thanks again
Henri

Tom Hill-6 wrote:
> 
> Hi -
> 
> Of the various approaches that you could take, the one I'd work on first
> is:
> 
>> deployment constraints imply one webapp instance.
> 
> In most environments, it's going to cost a lot less to change this, than
> to
> try to roll your own, or extensively modify solr.
> 
> I know I'm sidestepping your stated requirements, but I'd take a long look
> at that one.
> 
> BTW, We cut over from an embedded Lucene instance to Solr about 4 months
> ago, and are very happy that we did.
> 
> Tom
> 
> On 4/13/07, Henrib <[EMAIL PROTECTED]> wrote:
>>
>>
>> I'm trying to choose between embedding Lucene versus embedding Solr in
>> one
>> webapp.
>>
>> In Solr terms, functional requirements would more or less lead to
>> multiple
>> schema & conf (need CRUD/generation on those) and deployment constraints
>> imply one webapp instance. The choice I'm trying to make is thus:
>> -Embed Lucene and (attempt to) recode a lot of what Solr provides... (the
>> straw man)
>> -Embed Solr but refactor 'some' of its core, assuming it is correct to
>> see
>> one Solr core as the association of one schema & one conf.
>>
>> There have been a few threads about multiple indexes and/or
>> multiple/reloading schemas.
>> From what I gathered, one solution stems from the 'multiple webapp
>> instances
>> deployment' and implies 'extracting' the static instance (at least the
>> SolrCore) & thus host multiple Solr cores in one webapp.
>>
>> Obviously, the operations (queries/add/delete doc) would need to carry
>> which
>> core they are targeting (one 'core' being set as the 'default' for
>> compatibility purpose).
>> What will be the other big hurdles, the ones that could even preclude the
>> very idea ? (caches handling, updater threads, HA features...)
>> Are there any easier routes (class-loaders, 'provisional' schema...) ?
>>
>> Any advice welcome. Thanks.
>> Henri
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Embedding-Solr-vs-Lucene%2C-multiple-Solr-cores--tf3572324.html#a9981355
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Embedding-Solr-vs-Lucene%2C-multiple-Solr-cores--tf3572324.html#a9986289
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Embedding Solr vs Lucene, multiple Solr cores?

2007-04-16 Thread Henrib

I suppose I'm not the only one having to cope with the kind of policies I
was describing (& their idiosynchrasies); in some organizations, trying to
get IT to modify anything related to 'deployment policy' is just (very close
to) impossible... Within those, having a dedicated Tomcat to run the
application is the best thing that can happen! But even in this case, some
will impose only one web application deployed in it... 

Anyway, I've been looking deeper in removing the various static refs &
singletons.
The SolrCore singleton is the easy one but the various Configs are a bit
more tedious.
Would you rather have:
a/ 1 'config' package to gather all those configuration objects
(Query/Index/Admin/Cache/etc...)
b/ generalize the SolrIndexConfig kind, one SolrxxxConfig per configuration
sensitive class
Would it be ok to refactor the SolrConfig to aggregate those and have one
instance of SolrConfig per SolrCore? 
I lack understanding about JMX though to see how the different SolrCore
instances would be exposed as MBeans (if SolrInfoMBean end up being used
this way); any hint would be appreciated.

Henri

Chris Hostetter wrote:
> 
> 
> : but does not want to deal with the application itself, leaving the
> 'business
> : users' side administer it. Even if there is a dedicated Tomcat for the
> main
> : app, IT will not let the 'business users' install other applications
> (scope
> : of responsibility, code versus data, validation procedures, etc). Thus
> the
> : 'one application' constraint.
> 
> There tends to be a lot of devils in the details of policy discussions
> like this, but perhaps you could redefine the definition of an
> "application" from your ops/biz standpoint to be broader then the
> definition from a servlet container standpoint (ie: let the "application"
> be the entire Tomcat setup running several webapps)
> 
> Alternately, I've heard people mention in past discussions issues
> regarding service provider run servlet containers with self serve WAR hot
> deployment and the issues involved with only being able to hange your WAR
> and not having any control over hte container itself and i've always
> wondered: how hard would be to wrap tomcat (or jetty) so that it is a war
> that can run inside of another servlet container ... then you can have
> multiple wars embeded in that war and control the tomcat configsto your
> hearts content -- treating the ISPs servlet container like an OS.
> 
> : For the sake of argument :-), besides the SolrCore singletons which is
> easy
> : to circumvent (a map of cores & at least a pointer from the instantiated
> : schema to the core handling it, are there others that are hiding
> : (Config.config, caches...) that would preclude the multiple core track?
> 
> There are lots of places in the code where class instances use static refs
> to find the Core/Config/IndexSchema which would have to know know about
> your Map and keys ... it would be a lot of non trivial changes and
> refactoring i believe.
> 
> That said: If anyone is interested in tackling a patch to eliminate all of
> the static Singletons i (and many others i suspect) would be
> extremely gratefull .. both for how much it would improve hte reusability
> of Solr in embedded situatiosn like this, but also for how it would
> (hopefully) make hte code eaier to follow for future developers.
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Embedding-Solr-vs-Lucene%2C-multiple-Solr-cores--tf3572324.html#a10020669
Sent from the Solr - User mailing list archive at Nabble.com.

Multiple Solr Cores

2007-04-19 Thread Henrib


Following up on a previous thread in the Solr-User list, here is a patch that
allows managing multiple cores in the same VM (thus multiple
config/schemas/indexes).
The SolrCore.core singleton has been changed to a Map; the
current singleton behavior is keyed as 'null'. (Which is used by
SolrInfoRegistry).
All static references to either a Config or a SolrCore have been removed;
this implies that some classes now do refer to either a SolrCore or a
SolrConfig (some ctors have been modified accordingly).

I haven't tried to modify anything above the 'jar' (script, admin & servlet
are unaware of the multi-core part).

The 2 patches files are the src/ & the test/ patches.
http://www.nabble.com/file/7971/solr-test.patch solr-test.patch 
http://www.nabble.com/file/7972/solr-src.patch solr-src.patch 

This being my first attempt at a contribution, I will humbly welcome any
comment.
Regards,
Henri
-- 
View this message in context: 
http://www.nabble.com/Multiple-Solr-Cores-tf3608399.html#a10082201
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple indexes?

2007-04-19 Thread Henrib

You can not have more than one Solr core per application (to be precise, per
class-loader since there are a few statics).
One way is thus to have 2 webapps - when & if indexes do not have the same
lifetime/radically different schema/etc.
However, the common wisdom is that you usually dont really need different
indexes (I discussed about this last week).

If you really are in desperate need of multiple cores, in the 'Multiple Solr
Cores' thread, you'll find (early state) patches that allow just that...

Cheers
Henri

Matthew Runo wrote:
> 
> Hey there-
> 
> I was wondering if the following was possible, and, if so, how to set  
> it up...
> 
> I want to index two different types of data, and have them searchable  
> from the same interface.
> 
> For example, a group of products, with size, color, price, etc info.
> And a group of brands, with brand, genre, brand description, etc info
> 
> So, the info does overlap some. But a lot of the fields for each  
> "type" don't matter to the other. Is there a way to set up two  
> different schema so that both types may be indexed with relative ease?
> 
> ++
>   | Matthew Runo
>   | Zappos Development
>   | [EMAIL PROTECTED]
>   | 702-943-7833
> ++
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Multiple-indexes--tf3608429.html#a10083580
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple Solr Cores

2007-04-19 Thread Henrib


There is still only one solr.home instance used to load the various classes
which is used as the one 'root'.
>From there, you can have multiple solrconfig*.xml & schema*.xml (even
absolute pathes); calling new SolrCore(name_of_core, path_to_solrconfig,
path_to_schema) creates a named core that you can refer to.
To refer to a named core, you call SolrCore.getCore(name_of_core) (instead
of SolrCore.getCore()).

>From a servlet perspective, it seems that passing the name of the core back
& forth should do the trick (so we can reacquire the correct core). One
missing part is uploading a config & a schema then start a core (a dynamic
creation of a core). One thing to note is that a schema needs a config to be
created and it is certainly wise to use the same for schema & core
creations. 
For the admin servlet, we'd need to implement a way to choose the core we
want to observe.
And the scripts probably also need to have a 'core name' passed down...

I'm still building my knowledge on the subject so my simplistic view might
not be accurate.
Let me know if this helps.
Cheers
Henrib



mpelzsherman wrote:
> 
> This sounds like a great idea, and potentially very useful for my company.
> 
> Can you explain a bit about how you would configure the various solr/home
> paths, and how the different indexes would be accessed by clients?
> 
> Thanks!
> 
> - Michael
> 

-- 
View this message in context: 
http://www.nabble.com/Multiple-Solr-Cores-tf3608399.html#a10084772
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple Solr Cores

2007-04-20 Thread Henrib



Updated (forgot the patch for Servlet).
http://www.nabble.com/file/7996/solr-trunk-src.patch solr-trunk-src.patch 

The change should still be compatible with the trunk it is based upon.


Henrib wrote:
> 
> Following up on a previous thread in the Solr-User list, here is a patch
> that allows managing multiple cores in the same VM (thus multiple
> config/schemas/indexes).
> The SolrCore.core singleton has been changed to a Map;
> the current singleton behavior is keyed as 'null'. (Which is used by
> SolrInfoRegistry).
> All static references to either a Config or a SolrCore have been removed;
> this implies that some classes now do refer to either a SolrCore or a
> SolrConfig (some ctors have been modified accordingly).
> 
> I haven't tried to modify anything above the 'jar' (script, admin &
> servlet are unaware of the multi-core part).
> 
> The 2 patches files are the src/ & the test/ patches.
>  http://www.nabble.com/file/7971/solr-test.patch solr-test.patch 
>  http://www.nabble.com/file/7972/solr-src.patch solr-src.patch 
> 
> This being my first attempt at a contribution, I will humbly welcome any
> comment.
> Regards,
> Henri
> 

-- 
View this message in context: 
http://www.nabble.com/Multiple-Solr-Cores-tf3608399.html#a10106126
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language indexing and searching

2007-06-08 Thread Henrib


Hi Daniel,
If it is functionally 'ok' to search in only one lang at a time, you could
try having one index per lang. Each per-lang index would have one schema
where you would describe field types (the lang part coming through
stemming/snowball analyzers, per-lang stopwords & al) and the same field
name could be used in each of them.
You could either deploy that solution through multiple web-apps (one per
lang) (or try the patch for issue Solr-215).
Regards,
Henri


Daniel Alheiros wrote:
> 
> Hi, 
> 
> I'm just starting to use Solr and so far, it has been a very interesting
> learning process. I wasn't a Lucene user, so I'm learning a lot about
> both.
> 
> My problem is:
> I have to index and search content in several languages.
> 
> My scenario is a bit different from other that I've already read in this
> forum, as my client is the same to search any language and it could be
> accomplished using a field to define language.
> 
> My questions are more focused on how to keep the benefits of all the
> protwords, stopwords and synonyms in a multilanguage situation
> 
> Should I create new Analyzers that can deal with the "language" field of
> the
> document? What do you recommend?
> 
> Regards,
> Daniel 
> 
> 
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in
> reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>   
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Multi-language-indexing-and-searching-tf3885324.html#a11027333
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language indexing and searching

2007-06-09 Thread Henrib

Hi Daniel,
Trying to recap: you are indexing documents that can be in different
language. On the query side, users will only search in one language at a
time & get results in that language.

Setting aside the webapp deployment problem, the alternative is thus:
option1: 1 schema will all fields of all languages pre-defined
option2: 1 schema per lang with the same field names (but a different type).

You indicate that your documents do have a field carrying the language. Is
the Solr document format the authoring format of the documents you index or
do they require some pre-processing to extract those fields? For instance,
are the source documents in HTML and pre-processed using some XPath/magic to
generate the fields?
In that case, using option1, the pre-processing transformation needs to know
which fields to generate according to the language. Option2 needs you to
know which core you need to target based on the lang. And it goes the same
way for querying; for option1, you need a query with different fields for
each language, option2 requires to target the correct core.
In the other case, ie if the Solr document format is the source format,
indexing requires some script (curl or else) to send them to Solr; having
the script determine which core to target don't seem (from far) a hard task
(grep/awk  to the rescue :-)).

On the maintenance side, if you were to change the schema, need to reindex
one lang or add a lang, option1 seems to have a 'wider' impact, the
functional grain being coarser. Besides, if your collections are huge or
grow fast, it might be nice to have an easy way to partition the workload on
different machines which seems easier with option2, directing indexing &
queries to a site based on the lang.

On the webapp deployment side, option1 is a breeze, option2 requires
multiple web-app (Forgetting solr-215 patch that is unlikely to be reviewed
and accepted soon since its functional value is not shared).

Hope this helps in your choice, regards,
Henri

Daniel Alheiros wrote:
> 
> Hi Henri.
> 
> Thanks for your reply.
> I've just looked at the patch you referred, but doing this I will lose the
> out of the box Solr installation... I'll have to create my own Solr
> application responsible for creating the multiple cores and I'll have to
> change my indexing process to something able to notify content for a
> specific core.
> 
> Can't I have the same index, using one single core, same field names being
> processed by language specific components based on a field/parameter?
> 
> I will try to draw what I'm thinking, please forgive me if I'm not using
> the
> correct terms but I'm not an IR expert.
> 
> Thinking in a workflow:
> Indexing:
> Multilanguage indexer receives some documents
> for each document, verify the "language" field
> if language = "English" then process using the
> EnglishIndexer
> else if language = "Chinese" then process using the
> ChineseIndexer
> else if ...
> 
> Querying:
> Multilanguage Request Handler receives a request
> if parameter language = "English" then process using the
> English
> Request Handler
> else if parameter language = "Chinese" then process using the
> Chinese Request Handler
> else if ...
> 
> I can see that in the schema field definitions, we have some language
> dependent parameters... It can be a problem, as I would like to have the
> same fields for all requests...
> 
> Sorry to bother, but before I split all my data this way I would like to
> be
> sure that it's the best approach for me.
> 
> Regards,
> Daniel
> 
> 
> On 8/6/07 15:15, "Henrib" <[EMAIL PROTECTED]> wrote:
> 
>> 
>> Hi Daniel,
>> If it is functionally 'ok' to search in only one lang at a time, you
>> could
>> try having one index per lang. Each per-lang index would have one schema
>> where you would describe field types (the lang part coming through
>> stemming/snowball analyzers, per-lang stopwords & al) and the same field
>> name could be used in each of them.
>> You could either deploy that solution through multiple web-apps (one per
>> lang) (or try the patch for issue Solr-215).
>> Regards,
>> Henri
>> 
>> 
>> Daniel Alheiros wrote:
>>> 
>>> Hi, 
>>> 
>>> I'm just starting to use Solr and so far, it has been a very interesting
>>> learning process. I wasn't a Lucene user, so I'm learning a lot about
>>> both.
>>> 
>>> My problem is:
>>> I have to index and search content in

Filtering on a 'unique key' set

2007-06-17 Thread Henrib


Merely an efficiency related question: is there any other way to filter on a
uniqueKey set than using the 'fq' parameter & building a list of the
uniqueKeys?
In 'raw' Lucene, you could use filters directly in search; is this (close
to) equivalent efficiency wise?
Thanks

-- 
View this message in context: 
http://www.nabble.com/Filtering-on-a-%27unique-key%27-set-tf3935694.html#a11162273
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filtering on a 'unique key' set

2007-06-18 Thread Henrib


Thanks Yonik;
Let me twist the same question another way; I'm running Solr embedded, the
uniqueKey set that pre-exists  may be large, is per-query (most likely not
useful to cache it) and is iterable. I'd rather avoid making a string to
build the 'fq', get it parsed, etc.
Would it be as safe & more efficient in a (custom) request handler to create
a DocSet by fetching termDocs for each key used as a Term & use is as a
filter? Or is this just a bad idea?

Pseudo code being:
DocSet keyFilter(org.apache.lucene.index.IndexReader reader,
String keyField,
java.util.Iterator ikeys) throws java.io.IOException {
org.apache.solr.util.OpenBitSet bits = new
org.apache.solr.util.OpenBitSet(reader.maxDoc());
if (ikeys.hasNext()) {
org.apache.lucene.index.Term term = new
org.apache.lucene.index.Term(keyField,ikeys.next());
org.apache.lucene.index.TermDocs termDocs =
reader.termDocs(term);
if (termDocs.next())
bits.fastSet(termDocs.doc());
while(ikeys.hasNext()) {
termDocs.seek(term.createTerm(ikeys.next()));
if(termDocs.next())
bits.fastSet(termDocs.doc());
}
termDocs.close();
}
return new org.apache.solr.search.BitDocSet(bits);
}

Thanks again

Yonik Seeley wrote:
> 
> On 6/17/07, Henrib <[EMAIL PROTECTED]> wrote:
>> Merely an efficiency related question: is there any other way to filter
>> on a
>> uniqueKey set than using the 'fq' parameter & building a list of the
>> uniqueKeys?
> 
> I don't thnik so...
> 
>> In 'raw' Lucene, you could use filters directly in search; is this (close
>> to) equivalent efficiency wise?
> 
> Yes, any fq params are turned into filters.
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Filtering-on-a-%27unique-key%27-set-tf3935694.html#a11178089
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filtering on a 'unique key' set

2007-06-19 Thread Henrib



Is it reasonable to implement a RequestHandler that systematically uses a
DocSet as a filter for the restriction queries? I'm under the impression
that SolrIndexSearcher.getDocSet(Query, DocSet) would use the cache properly
& that calling it in a loop would perform the 'and' between the filters...

pseudo code (refactored from Standard & Dismax):
  /* * * Restrict Results * * */
  List restrictions = U.parseFilterQueries(req);
  DocSet rdocs = myUniqueKeySetThatMayBeNull();
  if (restrictions != null) for(Query r : restrictions) {
  rdocs = s.getDocSet(r, rdocs);
  }
  /* * * Generate Main Results * * */
  flags |= U.setReturnFields(req,rsp);
  DocListAndSet results = null;
  NamedList facetInfo = null;
  if (params.getBool(FACET,false)) {
results = s.getDocListAndSet(query, rdocs,
 SolrPluginUtils.getSort(req),
 params.getInt(START,0),
params.getInt(ROWS,10),
 flags);
facetInfo = getFacetInfo(req, rsp, results.docSet);
  } else {
results = new DocListAndSet();
results.docList = s.getDocList(query, rdocs,
   SolrPluginUtils.getSort(req),
   params.getInt(START,0),
params.getInt(ROWS,10),
   flags);
  }




Yonik Seeley wrote:
> 
> On 6/18/07, Henrib <[EMAIL PROTECTED]> wrote:
>> Thanks Yonik;
>> Let me twist the same question another way; I'm running Solr embedded,
>> the
>> uniqueKey set that pre-exists  may be large, is per-query (most likely
>> not
>> useful to cache it) and is iterable. I'd rather avoid making a string to
>> build the 'fq', get it parsed, etc.
>> Would it be as safe & more efficient in a (custom) request handler to
>> create
>> a DocSet by fetching termDocs for each key used as a Term & use is as a
>> filter?
> 
> Yes, that should work fine.
> Most of the savings will be avoiding the query parsing.
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Filtering-on-a-%27unique-key%27-set-tf3935694.html#a11195979
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filtering on a 'unique key' set

2007-06-19 Thread Henrib

What I'm after is to restrict the 'whole' index through a set of unique
keys.
Each unique key set is likely to have between 100 & 1 keys and these
sets are expected to be different for most of the queries. I'm trying to see
if I can achieve a generic 'fk' (for filter key) kind of parameter so this
could be applied to 'any' RequestHandler.

To keep the filter-queries functionality (as a List), I compute a
DocSet by using my 'unique key filter docset' as a base and iteratively
'and' it with the filter-queries executed through
SolrIndexReader.getDocSet(Query, DocSet).

The other way could be creating a BooleanQuery that 'ands' TermQueries built
from the unique key set; I might still revert to that since my current code
needs a small patch in SolrIndexReader (flags in getDocList).

In SolrIndexReader , if getDocListAndSet & getDocList were to accept a
DocSet filter plus the List, I'd use those but I had to choose
whether I use a List OR a DocSet as filter. I might have missed
something, the code being quite dense: the equivalent signatures I could
manage to get are:
public DocList getDocList(Query query, DocSet filter, Sort lsort, int
offset, int len, int flags) throws IOException;
and
public DocListAndSet getDocListAndSet(Query query, DocSet filter, Sort
lsort, int offset, int len, int flags) throws IOException;

It seems to work, I dont know if it is efficient cache wise & al.

Yonik Seeley wrote:
> 
> On 6/19/07, Henrib <[EMAIL PROTECTED]> wrote:
>> Is it reasonable to implement a RequestHandler that systematically uses a
>> DocSet as a filter for the restriction queries?
> 
> How many unique keys would typically be used to construct the filter?
> 
>> I'm under the impression
>> that SolrIndexSearcher.getDocSet(Query, DocSet) would use the cache
>> properly
>> & that calling it in a loop would perform the 'and' between the
>> filters...
> 
> Yes, but I wouldn't do that for each query unless each query was
> likely to have a different id list.
> 
> There's also a getDocListAndSet that takes a List as a filter.
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Filtering-on-a-%27unique-key%27-set-tf3935694.html#a11198985
Sent from the Solr - User mailing list archive at Nabble.com.

Filtering through list of document unique keys & keeping that order

Re: Custom information in schema.xml

Re: Loading solr from .properties file

Re: Loading solr from .properties file

Re: includes in solrconfig.xml

Re: XML includes in solrconfig.xml/schema.xml

Re: XML includes in solrconfig.xml/schema.xml

Re: CoreDescriptor explanation and possible bug

Re: CoreDescriptor explanation and possible bug

Re: CoreDescriptor explanation and possible bug

Re: Sending queries to multicore installation

Re: Some new SOLR features

Re: Some new SOLR features

Re: Some new SOLR features

Re: Switching cores dynamically

Re: multiple solr home directories

query handling / multiple languages / multiple cores

Filtering by document unique key

Re: GET_SCORES flag in SolrIndexSearcher

logging through log4j

RE: logging through log4j

Re: [poll] Change logging to SLF4J?

Embedding Solr vs Lucene, multiple Solr cores?

Re: Embedding Solr vs Lucene, multiple Solr cores?

Re: Embedding Solr vs Lucene, multiple Solr cores?

Multiple Solr Cores

Re: Multiple indexes?

Re: Multiple Solr Cores

Re: Multiple Solr Cores

Re: Multi-language indexing and searching

Re: Multi-language indexing and searching

Filtering on a 'unique key' set

Re: Filtering on a 'unique key' set

Re: Filtering on a 'unique key' set

Re: Filtering on a 'unique key' set

35 matches

Site Navigation

Mail list logo

Footer information