Re: Sharded Index Creation Magic?

2009-07-14 Thread Shalin Shekhar Mangar
On Tue, Jul 14, 2009 at 2:00 AM, Nick Dimiduk  wrote:

> However, when I search across all
> deployed shards using the &shards= query parameter (
>
> http://host00:8080/solr/select?shards=host00:8080/solr,host01:8080/solr&q=body
> \%3A%3Aterm),
> I get a NullPointerException:
>
> java.lang.NullPointerException
>at
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421)
>at
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:265)
>at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:264)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
>at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>
> Debugging into the QueryComponent.mergeIds() method reveals the instance
> sreq.responses (line 356) contains one response for each shard specified,
> each with the number of results received by the independant queries. The
> problems begin down at line 370 because the SolrDocument instance has only
> a
> score field -- which proves problematic in the following line where the id
> is requested. The SolrDocument, only containing a score, lacks the
> designated ID field (from my schema) and thus the document cannot be added
> to the results queue.
>
> Because the example on the wiki works by loading the documents directly
> into
> Solr for indexing, I have come to the conclusion that there is some extra
> magic happening in this index generation process which my process lacks.
>


Do you have a uniqueKey defined in your schema.xml?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Availability during merge

2009-07-14 Thread Shalin Shekhar Mangar
On Tue, Jul 14, 2009 at 2:30 AM, Charlie Jackson  wrote:

> The wiki page for merging solr cores
> (http://wiki.apache.org/solr/MergingSolrIndexes) mentions that the cores
> being merged cannot be indexed to during the merge. What about the core
> being merged *to*? In terms of the example on the wiki page, I'm asking
> if core0 can add docs while core1 and core2 are being merged into it.
>
>
A merge operation acquires the index writer lock, so any add operations sent
during the merge, will wait till the merge completes. So, even though you
can send add/delete commands to core0, they'll wait for the merge to finish.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Can't limit return fields in custom request handler

2009-07-14 Thread Osman İZBAT
Thank you very much Chris.

Regards.

On Mon, Jul 13, 2009 at 4:30 AM, Chris Hostetter
wrote:

>
> : Query filter = new TermQuery(new Term("inStores", "true"));
>
> that will work if "inStores" is a TextField or a StrField and it's got the
> term "true" indexed in it ... but if it's a BoolField like in the
> example schema then the values that appear in the index are "T" and "F"
>
> When you write custom Solr plugins You *HAVE* to leverage the FieldType of
> fields you deal with when building queries programaticly.  this is what
> the "FieldType.toInternal" method is for.
>
>
>
>
> -Hoss
>
>


-- 
Osman İZBAT


Custom funcionality in SolrIndexSearcher

2009-07-14 Thread Marc Sturlese

Hey there. I needed a funcionality similar to adjacent-field-collapsing but
instead of make the docs disapear I just wanted to put them at the end of
the list (ids array).

At the moment, I am just experimenting the way to obtain the shortests
reponse time. provably will not be able to use my solution  as it's a pretty
big core hack, just would like to hear advices of "cleaner" ways to do this
or about what do you think.
I don't want this algorithm to be applyed in the whole index as it makes
responses slower and have no interest in results after page 30, for example.
I just want it to be applyed for the first 3000 or 5000 results.

Due to performance issues (speed request and index size) couldn't use the
collapsing patch so what I have done is to apply the algorithm straight away
in getDocListAndSetNC and getDocListNC.

Basically what I do is... if the user asks for less than "considerHowMany"
docs I will ask for this number or if there are less I will ask for all of
them (when topCollector.topDocs... is called). then I will apply the
adjacent field collapse algorithm but instead of making the docs desapear I
will send them to the end of the cue. I meam, let's say a query has
1.357.534. I just want to apply the algorithm to the 5000 results. So, if
the 2nd results must be collapsed, it will go to the position 5000, if the
3rd must be collapse will go to 4999... After the 5000th the pseudo-collapse
algorithm will stop being applyied.

I have added to parameters to the QueryCommand use to decide if the
algorithm has to be applyed and for how many documents must be applyied.

I repeat it, it's just testing, I now it's not good to modify this
classes... just want to hear any advice that could help me to do something
similar without messing the code that much or what people think. I leave
here my getDocListAndSetNC.java (have done the same for getDocListNC):

  private DocSet getDocListAndSetNC(QueryResult qr,QueryCommand cmd) throws
IOException {
int len = cmd.getSupersetMaxDoc();
DocSet filter = cmd.getFilter()!=null ? cmd.getFilter() :
getDocSet(cmd.getFilterList());
int last = len;
if (last < 0 || last > maxDoc()) last=maxDoc();
final int lastDocRequested = last;
int nDocsReturned;
int totalHits;
float maxScore;
int[] ids;
float[] scores;
DocSet set;

//extra vars
boolean considerMoreDocs = cmd.getConsiderMoreDocs() ;
int considerHowMany = cmd.getConsiderHowMany() ;

boolean needScores = (cmd.getFlags() & GET_SCORES) != 0;
int maxDoc = maxDoc();
int smallSetSize = maxDoc>>6;

Query query = QueryUtils.makeQueryable(cmd.getQuery());
final long timeAllowed = cmd.getTimeAllowed();

final Filter luceneFilter = filter==null ? null : filter.getTopFilter();

// handle zero case...
if (lastDocRequested<=0) {
  final float[] topscore = new float[] { Float.NEGATIVE_INFINITY };

  Collector collector;
  DocSetCollector setCollector;

   if (!needScores) {
 collector = setCollector = new DocSetCollector(smallSetSize,
maxDoc);
   } else {
 collector = setCollector = new
DocSetDelegateCollector(smallSetSize, maxDoc, new Collector() {
   Scorer scorer;
   public void setScorer(Scorer scorer) throws IOException {
 this.scorer = scorer;
   }
   public void collect(int doc) throws IOException {
 float score = scorer.score();
 if (score > topscore[0]) topscore[0]=score;
   }
   public void setNextReader(IndexReader reader, int docBase) throws
IOException {
   }
 });
   }

   if( timeAllowed > 0 ) {
 collector = new TimeLimitingCollector(collector, timeAllowed);
   }
   try {
 super.search(query, luceneFilter, collector);
   }
   catch( TimeLimitingCollector.TimeExceededException x ) {
 log.warn( "Query: " + query + "; " + x.getMessage() );
 qr.setPartialResults(true);
   }

  set = setCollector.getDocSet();

  nDocsReturned = 0;
  ids = new int[nDocsReturned];
  scores = new float[nDocsReturned];
  totalHits = set.size();
  maxScore = totalHits>0 ? topscore[0] : 0.0f;
} else {

  TopDocsCollector topCollector;
  //This is how it was:
  /*/
  /*
  if (cmd.getSort() == null) {
 topCollector = TopScoreDocCollector.create(len, true); 
  } else {
 topCollector = TopFieldCollector.create(cmd.getSort(), len, false,
needScores, needScores, true); 
  }   
   **/ 
  
  if (cmd.getSort() == null)
  {
if(len < considerHowMany && considerMoreDocs){
topCollector = TopScoreDocCollector.create(considerHowMany,
true);
}else{
   topCollector = TopScoreDocCollector.create(len, true); 
}

  } else
  {
if(len < considerHowMany && considerMoreDocs){
topCollector = TopFieldCol

Using Multiple fields in UniqueKey

2009-07-14 Thread Anand Kumar Prabhakar

Is there any possiblity of Adding Multiple fields to the UniqueKey in
Schema.xml(An Implementation similar to Compound Primary Key)? 


-- 
View this message in context: 
http://www.nabble.com/Using-Multiple-fields-in-UniqueKey-tp24476088p24476088.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Implementing Solr for the first time

2009-07-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Jul 14, 2009 at 1:33 AM, Kevin
Miller wrote:
> I am new to Solr and trying to get it set up to index files from a
> directory structure on a server.  I have a few questions.
>
> 1.) Is there an application that will return the search results in a
> user friendly format?
isn't the xml response format user friendly ?
>
>
> 2.) How do I move Solr from the example environment into a production
> environment?
>
>
> 3.) Will Solr search through multiple folders when indexing and if so
> can I specify which folders to index from?
Solr does not search any folders. you will have to index the contents
of your folder into Solr.
>
>
> I have looked through the tutorial, the Docs, and the FAQ and am still
> having problems making sense of it.
>
> Kevin Miller
> Oklahoma Tax Commission
> Web Services
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Faceting

2009-07-14 Thread gwk
Well, I had a bit of a facepalm moment when thinking about it a little 
more, I'll just show a "more countries [Y selected]" where Y is the 
number of countries selected which are not in the top X. If you want a 
nice concise interface you'll just have to enable javascript. With my 
earlier adventures in numerical range selection (solr-1240) I became 
wary of just adding facet.query parameters as Solr seemed to crash when 
adding a lot of facet.queries of the form facet.query=price:[* TO 
10]&facet.query:[10 TO 20] etc. etc


Thanks for your help,

Regards,

Gijs

Shalin Shekhar Mangar wrote:

On Mon, Jul 13, 2009 at 7:56 PM, gwk  wrote:

  

Is there a good way to select the top X facets and include some terms you
want to include as well something like
facet.field=country&f.country.facet.limit=X&f.country.facet.includeterms=Narnia,Guilder
or is there some other way to achieve this?




You can use facet.query for each of the terms you want to include. You may
need to remove such terms from appearing in the facet.field=country results
in the client.

e.g.
facet.field=country&f.country.facet.limit=X&facet.query=country:Narnia&facet.query=country:Guilder

  




Re: Distributed Search in Solr

2009-07-14 Thread Sumit Aggarwal
Hi Grant,
What i have got from your comments is:

1. We will have to add a support for BoostingTermQuery which
extends SpanTermQuery like in lucene payload support. In current world we
anyway have other class which is extending SpanTermQuery . Where should i
put this class or newly built BoostingTermQuery  and how i can use this
class or BoostingTermQuery class?
2. I have not got much why we require TokenFilterFactory.

In our application (our own search server) we already have  payload related
search for which we are using some thing searcher.setSimilarity(Similarity).
Don't we require this in solr payload search.

Now can you please explain little more how can we do payload search using
solr. I mean we will need to set some payload term using BoostingtermQuery
how we will be doing it in solr. How we will be passing such search to solr?

- Sumit

On Fri, Jul 10, 2009 at 8:54 PM, Grant Ingersoll wrote:

>
> On Jul 9, 2009, at 11:58 PM, Sumit Aggarwal wrote:
>
>  Hi,
>> 1. Calls made to multiple shards are made in some concurrent fashion or
>> serially?
>>
>
> Concurrent
>
>  2. Any idea of algorithm followed for merging data? I mean how efficient
>> it
>> is?
>>
>
> Not sure, but given that Yonik implemented it, I suspect it is highly
> efficient.  ;-)
>
>  3. Lucene provides payload concept. How can we make search using that in
>> solr. My application store payloads and use search using our custom search
>> server.
>>
>
> Not currently, but this would be a welcome patch.  I added a new
> DelimitedPayloadTokenFilter to Lucene that should make it really easy to
> send in payloads "inline" in Solr XML, so what remains to be done, I think
> is:
>
> 1. Create a new TokenFilterFactory for the TokenFilter
> 2. Hook in some syntax support for creating a BoostingTermQuery in the
> Query Parsers.
>
> Patches welcome!
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


-- 
Cheers
Sumit
9818621804


Data Import ID Problem

2009-07-14 Thread Chris Masters

Hi All,

I have a problem when importing data using the data import handler. I import 
documents from multiple tables so table.id is not unique - to get round this I 
concatenate the type like this:


    
    
    


When searching it seems the CONCATted string is turned into some sort of 
charcter array(?):


 
  1 
  [...@108759d  
   

   Everything is OK if I add a document via SolrJ:

    
  SolrInputDocument doc = 
  doc.addField( 
  doc.addField( newSolrInputDocument();"id", myThing.getId() + 
TCSearch.SEARCH_TYPE_THING);"dbid", myThing.getId());
   

   Obviously this will cause problems as I remove documents by consturcting the 
ID and using deleteById. Any ideas?

   Thanks, rotis





RE: Implementing Solr for the first time

2009-07-14 Thread Kevin Miller
I am needing to index primarily .doc files but also need it to look at
.pdf and .xls files.  I am currently looking at the Tika project for
this functionality. 


Kevin Miller
Web Services

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Tuesday, July 14, 2009 1:34 AM
To: solr-user@lucene.apache.org
Subject: Re: Implementing Solr for the first time

On Tue, Jul 14, 2009 at 1:33 AM, Kevin Miller <
kevin.mil...@oktax.state.ok.us> wrote:

> I am new to Solr and trying to get it set up to index files from a 
> directory structure on a server.  I have a few questions.
>
> 1.) Is there an application that will return the search results in a 
> user friendly format?
>

I'm not sure. There is a ruby application called flare but I haven't
used it myself. People usually build their own applications and use Solr
as a search server.


> 2.) How do I move Solr from the example environment into a production 
> environment?
>

If you mean how do you change the example schema/config, then that
depends entirely on the kind of data you want to search.

Some good starting points on deciding the schema are:
http://wiki.apache.org/solr/SchemaDesign
http://wiki.apache.org/solr/UniqueKey


> 3.) Will Solr search through multiple folders when indexing and if so 
> can I specify which folders to index from?
>

Solr does not search through folders. Solr is only a server. You can
either write a program to push data to Solr or use a plugin like
DataImportHandler to do this.

http://wiki.apache.org/solr/DataImportHandler

What are the kind of files you are indexing?

--
Regards,
Shalin Shekhar Mangar.


support for Payload Feature of lucene in solr

2009-07-14 Thread Sumit Aggarwal
Hi,
As i am new to solr and trying to explore payloads in solr but i haven't got
any success on that. In one of the thread Grant mentioned solr have
DelimitedPayloadTokenFilter which
can store payloads at index time. But to make search on it we will
require  implementation of BoostingTermQuery extending SpanTermQuery . And
if any other thing also we require.

My Question:
1. What all i will have to do for this.
2. How i will do this. I mean even if by adding some classes and rebuilding
solr jars and then how i will prepare Document to index to store payloads
and how i will build my search query to do payload search. Do we need to add
a new Requesthandler for making such custom searches? Please provide a
sample code if have any...

-- 
Cheers
Sumit


TooManyOpenFiles: indexing in one core, doing many searches at the same time in another

2009-07-14 Thread Bruno Aranda
Hi,

We are having a TooManyOpenFiles exception in our indexing process. We
are reading data from a database and indexing this data into one of
the two cores of our solr instance. Each of the cores has a different
schema as they are used for a different purpose. While we index in the
first core, we do many searches in the second core as it contains data
to "enrich" what we index (the second core is never modifier - read
only). After indexing about 50.000 documents (about 300 fields each)
we get the exception. If we run the same process, but without the
"enrichment" (not doing queries in the second core), everything goes
all right.
We are using spring batch, and we only commit+optimize at the very
end, as we don't need to search anything in the data that is being
indexed.

I have seen recommendations that go from committing+optimize more
often or lowering the merge factor? How is the merge factor affecting
in this scenario?

Thanks,

Bruno


Re: Data Import ID Problem

2009-07-14 Thread Chris Masters

Sorry - The solrJ snippet shoud read:


SolrInputDocument doc = 
doc.addField( 
doc.addField( newSolrInputDocument();"id", myThing.getId() + 
TCSearch.SEARCH_TYPE_THING);"dbid", myThing.getId());



- Original Message 
From: Chris Masters 
To: solr-user@lucene.apache.org
Sent: Tuesday, July 14, 2009 12:16:06 PM
Subject: Data Import ID Problem


Hi All,

I have a problem when importing data using the data import handler. I import 
documents from multiple tables so table.id is not unique - to get round this I 
concatenate the type like this:


    
    
    


When searching it seems the CONCATted string is turned into some sort of 
charcter array(?):


 
  1 
  [...@108759d  
   

   Everything is OK if I add a document via SolrJ:

    
  SolrInputDocument doc = 
  doc.addField( 
  doc.addField( newSolrInputDocument();"id", myThing.getId() + 
TCSearch.SEARCH_TYPE_THING);"dbid", myThing.getId());
   

   Obviously this will cause problems as I remove documents by consturcting the 
ID and using deleteById. Any ideas?

   Thanks, rotis





Re: Spell checking: Is there a way to exclude words known to be wrong?

2009-07-14 Thread Erik Hatcher
Use the stopwords feature with a custom mispeled_words.txt and a  
StopFilterFactory on the spell check field ;)


Erik


On Jul 13, 2009, at 8:27 PM, Jay Hill wrote:


We're building a spell index from a field in our main index with the
following configuration:

textSpell

  default
  spell
  ./spellchecker
  true



This works great and re-builds the spelling index on commits as  
expected.
However, we know there are misspellings in the "spell" field of our  
main
index. We could remove these from the spelling index using Luke,  
however
they will be added again on commits. What we need is something  
similar to
how the protwords.txt file is used. So that when we notice  
misspelled words
such as "beginnning" being pulled from our main index we could add  
them to

an exclusion file so they are not added to the spelling index again.

Any tricks to make this possible?

-Jay




Re: TooManyOpenFiles: indexing in one core, doing many searches at the same time in another

2009-07-14 Thread Marc Sturlese

Setting:
2

may help. have you tried it? Indexing will be a bit slower but will be
faster optimizing.
You can check with lsof to see how many files jetty/tomcat (or the server
you are using) is holding


Bruno Aranda wrote:
> 
> Hi,
> 
> We are having a TooManyOpenFiles exception in our indexing process. We
> are reading data from a database and indexing this data into one of
> the two cores of our solr instance. Each of the cores has a different
> schema as they are used for a different purpose. While we index in the
> first core, we do many searches in the second core as it contains data
> to "enrich" what we index (the second core is never modifier - read
> only). After indexing about 50.000 documents (about 300 fields each)
> we get the exception. If we run the same process, but without the
> "enrichment" (not doing queries in the second core), everything goes
> all right.
> We are using spring batch, and we only commit+optimize at the very
> end, as we don't need to search anything in the data that is being
> indexed.
> 
> I have seen recommendations that go from committing+optimize more
> often or lowering the merge factor? How is the merge factor affecting
> in this scenario?
> 
> Thanks,
> 
> Bruno
> 
> 

-- 
View this message in context: 
http://www.nabble.com/TooManyOpenFiles%3A-indexing-in-one-core%2C-doing-many-searches-at-the--same-time-in-another-tp24478812p24479144.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: TooManyOpenFiles: indexing in one core, doing many searches at the same time in another

2009-07-14 Thread Mark Miller
What merge factor are you using now? The merge factor will influence the
number of files that are created as the index grows. Lower = fewer file
descriptors needed, but also slower bulk indexing.
You could up the Max Open Files settings on your OS.

You could also use

true

Which writes multiple segments to one file and requires *way* less file
handles (slightly slower indexing).

It would normally be odd to hit something like that after only 50,000
documents, but a doc with 300 fields is certainly not the norm ;) Anything
else special about your setup?

-- 
- Mark

http://www.lucidimagination.com

On Tue, Jul 14, 2009 at 12:49 PM, Bruno Aranda wrote:

> Hi,
>
> We are having a TooManyOpenFiles exception in our indexing process. We
> are reading data from a database and indexing this data into one of
> the two cores of our solr instance. Each of the cores has a different
> schema as they are used for a different purpose. While we index in the
> first core, we do many searches in the second core as it contains data
> to "enrich" what we index (the second core is never modifier - read
> only). After indexing about 50.000 documents (about 300 fields each)
> we get the exception. If we run the same process, but without the
> "enrichment" (not doing queries in the second core), everything goes
> all right.
> We are using spring batch, and we only commit+optimize at the very
> end, as we don't need to search anything in the data that is being
> indexed.
>
> I have seen recommendations that go from committing+optimize more
> often or lowering the merge factor? How is the merge factor affecting
> in this scenario?
>
> Thanks,
>
> Bruno
>


Re: Implementing Solr for the first time

2009-07-14 Thread Erik Hatcher


On Jul 14, 2009, at 8:00 AM, Kevin Miller wrote:


I am needing to index primarily .doc files but also need it to look at
.pdf and .xls files.  I am currently looking at the Tika project for
this functionality.


This is now built into trunk (aka Solr 1.4): 
http://wiki.apache.org/solr/ExtractingRequestHandler

Erik



Anyone working on adapting AnalyzingQueryParser to solr?

2009-07-14 Thread Bill Dueber
The lucene class AnalyzingQueryParser does exactly what I need it to do, but
I need to do it in Solr. I took a look at trying to subclass QParser, and
it's clear I'm not smart enough. :-)

Is anyone else looking at this?

 -Bill-


-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


wt=json Not setting application/json reponse headers but text/plain. Howto fix?

2009-07-14 Thread Julian Davchev
Hi folks
I see that when calling wt=json I get json response but headers are 
text/plain which totally bugs me.
I rather expect  application/json response headers.

Any pointers are more than welcome how I can fix this.


Re: Spell checking: Is there a way to exclude words known to be wrong?

2009-07-14 Thread Shalin Shekhar Mangar
On Tue, Jul 14, 2009 at 6:37 PM, Erik Hatcher wrote:

> Use the stopwords feature with a custom mispeled_words.txt and a
> StopFilterFactory on the spell check field ;)
>
>
Very cool! :)

-- 
Regards,
Shalin Shekhar Mangar.


Re: Implementing Solr for the first time

2009-07-14 Thread Erik Hatcher


On Jul 14, 2009, at 5:35 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



On Tue, Jul 14, 2009 at 1:33 AM, Kevin
Miller wrote:

I am new to Solr and trying to get it set up to index files from a
directory structure on a server.  I have a few questions.

1.) Is there an application that will return the search results in a
user friendly format?

isn't the xml response format user friendly ?


   LOL!


3.) Will Solr search through multiple folders when indexing and if so
can I specify which folders to index from?

Solr does not search any folders. you will have to index the contents
of your folder into Solr.


Fairly straightforward to have some script that loops over a directory  
and sends files (or file paths/URLs) to the extracting request handler.


Erik



Re: TooManyOpenFiles: indexing in one core, doing many searches at the same time in another

2009-07-14 Thread Bruno Aranda
Hi, my process is:

I index 60 docs in the secondary core (each doc has 5 fields). No
problem with that. After this core is indexed (and optimized) it will
be used only for searches, during the main core indexing.
Currently, I am using mergeFactoror 10 for the main core. I will try
with 2 to see if it changes and the useCompoundFile set to true. I
guess I don't need to modify anything in the secondary core as it is
only used for searches.

Thanks for your answers,

Bruno

2009/7/14 Mark Miller :
> What merge factor are you using now? The merge factor will influence the
> number of files that are created as the index grows. Lower = fewer file
> descriptors needed, but also slower bulk indexing.
> You could up the Max Open Files settings on your OS.
>
> You could also use
>    
>    true
>
> Which writes multiple segments to one file and requires *way* less file
> handles (slightly slower indexing).
>
> It would normally be odd to hit something like that after only 50,000
> documents, but a doc with 300 fields is certainly not the norm ;) Anything
> else special about your setup?
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> On Tue, Jul 14, 2009 at 12:49 PM, Bruno Aranda wrote:
>
>> Hi,
>>
>> We are having a TooManyOpenFiles exception in our indexing process. We
>> are reading data from a database and indexing this data into one of
>> the two cores of our solr instance. Each of the cores has a different
>> schema as they are used for a different purpose. While we index in the
>> first core, we do many searches in the second core as it contains data
>> to "enrich" what we index (the second core is never modifier - read
>> only). After indexing about 50.000 documents (about 300 fields each)
>> we get the exception. If we run the same process, but without the
>> "enrichment" (not doing queries in the second core), everything goes
>> all right.
>> We are using spring batch, and we only commit+optimize at the very
>> end, as we don't need to search anything in the data that is being
>> indexed.
>>
>> I have seen recommendations that go from committing+optimize more
>> often or lowering the merge factor? How is the merge factor affecting
>> in this scenario?
>>
>> Thanks,
>>
>> Bruno
>>
>


Guide to using SolrQuery object

2009-07-14 Thread Reuben Firmin
Hi,

It seems that SolrQuery is a better API than the basic ModifiableSolrParams,
but I can't make it work.

Constructing params with:
final ModifiableSolrParams params = new ModifiableSolrParams();
params.set("q", queryString);

...results in a successful search.

Constructing SolrQuery with:
final SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(queryString);

... doesn't (with the same unit test driving the search). I'm sure I'm
missing some basic option, but the javadoc is a little terse, and I don't
see what I'm missing. Ideas?

Also, are there enums or constants around the various param names that can
be passed in, or do people tend to define those themselves?

Thanks!
Reuben


Re: support for Payload Feature of lucene in solr

2009-07-14 Thread Toby Cole
As i am new to solr and trying to explore payloads in solr but i  
haven't got

any success on that. In one of the thread Grant mentioned solr have
DelimitedPayloadTokenFilter which
can store payloads at index time. But to make search on it we will
require  implementation of BoostingTermQuery extending  
SpanTermQuery . And

if any other thing also we require.


This looks about the same as the approach I'm about to use for our  
research.
We're looking into using payloads to improve relevance for stemmed  
terms, using the payload to store the unstemmed term, boosting the  
term if there's an exact match with the payloads.



My Question:
1. What all i will have to do for this.
2. How i will do this. I mean even if by adding some classes and  
rebuilding
solr jars and then how i will prepare Document to index to store  
payloads
and how i will build my search query to do payload search. Do we  
need to add

a new Requesthandler for making such custom searches? Please provide a
sample code if have any...

--
Cheers
Sumit



I'm starting work on this in the next few days, I'll let you know how  
I get on.
If anyone else has any experience with payloads in solr please chip  
in :)



--

Toby Cole
Software Engineer, Semantico Limited
 
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



Re: support for Payload Feature of lucene in solr

2009-07-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
right now Solr does not support indexing/retrieving payloads. Probably
this can be taken up as an issue

On Tue, Jul 14, 2009 at 5:41 PM, Sumit
Aggarwal wrote:
> Hi,
> As i am new to solr and trying to explore payloads in solr but i haven't got
> any success on that. In one of the thread Grant mentioned solr have
> DelimitedPayloadTokenFilter which
> can store payloads at index time. But to make search on it we will
> require  implementation of BoostingTermQuery extending SpanTermQuery . And
> if any other thing also we require.
>
> My Question:
> 1. What all i will have to do for this.
> 2. How i will do this. I mean even if by adding some classes and rebuilding
> solr jars and then how i will prepare Document to index to store payloads
> and how i will build my search query to do payload search. Do we need to add
> a new Requesthandler for making such custom searches? Please provide a
> sample code if have any...
>
> --
> Cheers
> Sumit
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Data Import ID Problem

2009-07-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
DIH is getting the field as it as a byte[] ? which db and which driver
are you using?

On Tue, Jul 14, 2009 at 4:46 PM, Chris Masters wrote:
>
> Hi All,
>
> I have a problem when importing data using the data import handler. I import 
> documents from multiple tables so table.id is not unique - to get round 
> this I concatenate the type like this:
>
> 
>     
>     
>     
> 
>
> When searching it seems the CONCATted string is turned into some sort of 
> charcter array(?):
>
> 
> 
>   1
>   [...@108759d
>    
>
>    Everything is OK if I add a document via SolrJ:
>
>     
>   SolrInputDocument doc =
>   doc.addField(
>   doc.addField( newSolrInputDocument();"id", myThing.getId() + 
> TCSearch.SEARCH_TYPE_THING);"dbid", myThing.getId());
>    
>
>    Obviously this will cause problems as I remove documents by consturcting 
> the ID and using deleteById. Any ideas?
>
>    Thanks, rotis
>
>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: support for Payload Feature of lucene in solr

2009-07-14 Thread Walter Underwood
That doesn't require payloads. I was doing that with Solr 1.1. Define two
fields, stemmed and exact, with different analyzer chains. Use copyfield to
load the same info into both. With the dismax handler, search both fields
with a higher boost on the exact field.

wunder

On 7/14/09 7:39 AM, "Toby Cole"  wrote:

> We're looking into using payloads to improve relevance for stemmed
> terms, using the payload to store the unstemmed term, boosting the
> term if there's an exact match with the payloads.



Re: support for Payload Feature of lucene in solr

2009-07-14 Thread Sumit Aggarwal
Hi Walter,
I do have a search server where i have implemented things using payload
feature itself. These days i am evaluating solr to get rid of my own search
server. For that i need payloads feature in solr itself. I raised a related
question and got a message from *Grant* as
* "**I added a new DelimitedPayloadTokenFilter to Lucene that should make it
really easy to send in payloads "inline" in Solr XML, so what remains to be
done, I think is:*
*1. Create a new TokenFilterFactory for the TokenFilter
**2. Hook in some syntax support for creating a BoostingTermQuery in the
Query Parsers.**"*

Now can any one provide any custom code to do what grant mentioned.

Thanks,
Sumit

On Tue, Jul 14, 2009 at 8:24 PM, Walter Underwood wrote:

> That doesn't require payloads. I was doing that with Solr 1.1. Define two
> fields, stemmed and exact, with different analyzer chains. Use copyfield to
> load the same info into both. With the dismax handler, search both fields
> with a higher boost on the exact field.
>
> wunder
>
> On 7/14/09 7:39 AM, "Toby Cole"  wrote:
>
> > We're looking into using payloads to improve relevance for stemmed
> > terms, using the payload to store the unstemmed term, boosting the
> > term if there's an exact match with the payloads.
>
>


Re: Data Import ID Problem

2009-07-14 Thread Chris Masters

MySQL -> com.mysql.jdbc.Driver (mysql-connector-java-5.1.7.jar).

mysql concat -> 
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_concat

Fix is to use CAST like:

SELECT CONCAT(CAST(THING.ID AS CHAR),TYPE) AS INDEX_ID...

Thanks for the nudge 'Noble Paul'!



- Original Message 
From: Noble Paul നോബിള്‍ नोब्ळ् 
To: solr-user@lucene.apache.org
Sent: Tuesday, July 14, 2009 3:53:44 PM
Subject: Re: Data Import ID Problem

DIH is getting the field as it as a byte[] ? which db and which driver
are you using?

On Tue, Jul 14, 2009 at 4:46 PM, Chris Masters wrote:
>
> Hi All,
>
> I have a problem when importing data using the data import handler. I import 
> documents from multiple tables so table.id is not unique - to get round 
> this I concatenate the type like this:
>
> 
>     
>     
>     
> 
>
> When searching it seems the CONCATted string is turned into some sort of 
> charcter array(?):
>
> 
> 
>   1
>   [...@108759d
>    
>
>    Everything is OK if I add a document via SolrJ:
>
>     
>   SolrInputDocument doc =
>   doc.addField(
>   doc.addField( newSolrInputDocument();"id", myThing.getId() + 
> TCSearch.SEARCH_TYPE_THING);"dbid", myThing.getId());
>    
>
>    Obviously this will cause problems as I remove documents by consturcting 
> the ID and using deleteById. Any ideas?
>
>    Thanks, rotis
>
>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com






Re: support for Payload Feature of lucene in solr

2009-07-14 Thread Sumit Aggarwal
Hey Nobel,
Any comments on Grant suggestion.

Thanks,
-Sumit

On Tue, Jul 14, 2009 at 8:40 PM, Sumit Aggarwal
wrote:

> Hi Walter,
> I do have a search server where i have implemented things using payload
> feature itself. These days i am evaluating solr to get rid of my own search
> server. For that i need payloads feature in solr itself. I raised a related
> question and got a message from *Grant* as
> * "**I added a new DelimitedPayloadTokenFilter to Lucene that should make
> it really easy to send in payloads "inline" in Solr XML, so what remains to
> be done, I think is:*
> *1. Create a new TokenFilterFactory for the TokenFilter
> **2. Hook in some syntax support for creating a BoostingTermQuery in the
> Query Parsers.**"*
>
> Now can any one provide any custom code to do what grant mentioned.
>
> Thanks,
> Sumit
>
> On Tue, Jul 14, 2009 at 8:24 PM, Walter Underwood 
> wrote:
>
>> That doesn't require payloads. I was doing that with Solr 1.1. Define two
>> fields, stemmed and exact, with different analyzer chains. Use copyfield
>> to
>> load the same info into both. With the dismax handler, search both fields
>> with a higher boost on the exact field.
>>
>> wunder
>>
>> On 7/14/09 7:39 AM, "Toby Cole"  wrote:
>>
>> > We're looking into using payloads to improve relevance for stemmed
>> > terms, using the payload to store the unstemmed term, boosting the
>> > term if there's an exact match with the payloads.
>>
>>
>


Re: Sharded Index Creation Magic?

2009-07-14 Thread Nick Dimiduk
I do, but you raise an interesting point. I had named the field incorrectly.
I'm a little puzzled as to why individual search worked with the broken
field name, but now all is well!

On Tue, Jul 14, 2009 at 12:03 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Jul 14, 2009 at 2:00 AM, Nick Dimiduk  wrote:
>
> > However, when I search across all
> > deployed shards using the &shards= query parameter (
> >
> >
> http://host00:8080/solr/select?shards=host00:8080/solr,host01:8080/solr&q=body
> > \%3A%3Aterm),
> > I get a NullPointerException:
> >
> > java.lang.NullPointerException
> >at
> >
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421)
> >at
> >
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:265)
> >at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:264)
> >at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
> >
> > Debugging into the QueryComponent.mergeIds() method reveals the instance
> > sreq.responses (line 356) contains one response for each shard specified,
> > each with the number of results received by the independant queries. The
> > problems begin down at line 370 because the SolrDocument instance has
> only
> > a
> > score field -- which proves problematic in the following line where the
> id
> > is requested. The SolrDocument, only containing a score, lacks the
> > designated ID field (from my schema) and thus the document cannot be
> added
> > to the results queue.
> >
> > Because the example on the wiki works by loading the documents directly
> > into
> > Solr for indexing, I have come to the conclusion that there is some extra
> > magic happening in this index generation process which my process lacks.
> >
>
>
> Do you have a uniqueKey defined in your schema.xml?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Sharded Index Creation Magic?

2009-07-14 Thread Shalin Shekhar Mangar
On Tue, Jul 14, 2009 at 10:30 PM, Nick Dimiduk  wrote:

> I do, but you raise an interesting point. I had named the field
> incorrectly.
> I'm a little puzzled as to why individual search worked with the broken
> field name, but now all is well!
>
>
An individual Solr uses uniqueKey only for replacing documents during
indexing. During a search the uniqueKey is used only for associating certain
pieces of information with documents e.g. highlighting info is written in
the response per uniqueKey. Solr will complain only if you don't specify a
uniqueKey during indexing.

If you forgot to include uniqueKeys in some documents, changed to schema to
add a uniqueKey and then didn't reindex the whole bunch, there will be some
documents in the index without a value in the unique key field. In such a
case, if you use distributed search, it will blow up because it expects all
documents to have a value for the uniqueKey field. These values are used to
merge responses from the shards.

-- 
Regards,
Shalin Shekhar Mangar.


Re: wt=json Not setting application/json reponse headers but text/plain. Howto fix?

2009-07-14 Thread Avlesh Singh
Take a look at https://issues.apache.org/jira/browse/SOLR-1123
Don't stop yourself from voting for the issue :)

Cheers
Avlesh

On Tue, Jul 14, 2009 at 7:01 PM, Julian Davchev  wrote:

> Hi folks
> I see that when calling wt=json I get json response but headers are
> text/plain which totally bugs me.
> I rather expect  application/json response headers.
>
> Any pointers are more than welcome how I can fix this.
>


Re: Availability during merge

2009-07-14 Thread Jason Rutherglen
Kind of regrettable, I think we can look at changing this in Lucene.

On Tue, Jul 14, 2009 at 12:08 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Jul 14, 2009 at 2:30 AM, Charlie Jackson <
> charlie.jack...@cision.com
> > wrote:
>
> > The wiki page for merging solr cores
> > (http://wiki.apache.org/solr/MergingSolrIndexes) mentions that the cores
> > being merged cannot be indexed to during the merge. What about the core
> > being merged *to*? In terms of the example on the wiki page, I'm asking
> > if core0 can add docs while core1 and core2 are being merged into it.
> >
> >
> A merge operation acquires the index writer lock, so any add operations
> sent
> during the merge, will wait till the merge completes. So, even though you
> can send add/delete commands to core0, they'll wait for the merge to
> finish.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Wikipedia or reuters like index for testing facets?

2009-07-14 Thread Jason Rutherglen
Is there a standard index like what Lucene uses for contrib/benchmark for
executing faceted queries over? Or maybe we can randomly generate one that
works in conjunction with wikipedia? That way we can execute real world
queries against faceted data. Or we could use the Lucene/Solr mailing lists
and other data (ala Lucid's faceted site) as a standard index?


Re: support for Payload Feature of lucene in solr

2009-07-14 Thread Shalin Shekhar Mangar
It may be nice to tell us why you need payloads? There may be other ways of
solving your problem than adding payload support to Solr? Anyway, I don't
see payload support before 1.5

On Tue, Jul 14, 2009 at 10:07 PM, Sumit Aggarwal
wrote:

> Hey Nobel,
> Any comments on Grant suggestion.
>
> Thanks,
> -Sumit
>
> On Tue, Jul 14, 2009 at 8:40 PM, Sumit Aggarwal
> wrote:
>
> > Hi Walter,
> > I do have a search server where i have implemented things using payload
> > feature itself. These days i am evaluating solr to get rid of my own
> search
> > server. For that i need payloads feature in solr itself. I raised a
> related
> > question and got a message from *Grant* as
> > * "**I added a new DelimitedPayloadTokenFilter to Lucene that should make
> > it really easy to send in payloads "inline" in Solr XML, so what remains
> to
> > be done, I think is:*
> > *1. Create a new TokenFilterFactory for the TokenFilter
> > **2. Hook in some syntax support for creating a BoostingTermQuery in the
> > Query Parsers.**"*
> >
> > Now can any one provide any custom code to do what grant mentioned.
> >
> > Thanks,
> > Sumit
> >
> > On Tue, Jul 14, 2009 at 8:24 PM, Walter Underwood <
> wunderw...@netflix.com>wrote:
> >
> >> That doesn't require payloads. I was doing that with Solr 1.1. Define
> two
> >> fields, stemmed and exact, with different analyzer chains. Use copyfield
> >> to
> >> load the same info into both. With the dismax handler, search both
> fields
> >> with a higher boost on the exact field.
> >>
> >> wunder
> >>
> >> On 7/14/09 7:39 AM, "Toby Cole"  wrote:
> >>
> >> > We're looking into using payloads to improve relevance for stemmed
> >> > terms, using the payload to store the unstemmed term, boosting the
> >> > term if there's an exact match with the payloads.
> >>
> >>
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Wikipedia or reuters like index for testing facets?

2009-07-14 Thread Mark Miller
On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:

> Is there a standard index like what Lucene uses for contrib/benchmark for
> executing faceted queries over? Or maybe we can randomly generate one that
> works in conjunction with wikipedia? That way we can execute real world
> queries against faceted data. Or we could use the Lucene/Solr mailing lists
> and other data (ala Lucid's faceted site) as a standard index?
>

I don't think there is any standard set of docs for solr testing - there is
not a real benchmark contrib - though I know more than a few of us have
hacked up pieces of Lucene benchmark to work with Solr - I think I've done
it twice now ;)

Would be nice to get things going. I was thinking the other day: I wonder
how hard it would be to make Lucene Benchmark generic enough to accept Solr
impls and Solr algs?

It does a lot that would suck to duplicate.

-- 
-- 
- Mark

http://www.lucidimagination.com


Multicore Solr (trunk) creates extra dirs

2009-07-14 Thread Otis Gospodnetic

Hello,

I just built solr.war from trunk and deployed it to a multicore solr server 
whose solr.xml looks like this:


  


  


Each core has conf and data/index dirs under its instanceDir.
e.g.

$ tree /mnt/solrhome/cores/core0
cores/core0
|-- conf
|   |-- schema.xml -> ../../../conf/schema-foo.xml
|   `-- solrconfig.xml -> ../../../conf/solrconfig-foo.xml
`-- data
`-- index

I noticed that when I start the container with this brand new Solr all of a 
sudden the '/mnt' directory shows up in /mnt/solrhome !
(this /mnt/solrhome is also the directory from which I happened to start the 
container, though I'm not sure if that matters).

This is what this /mnt/solrhome/mnt looks like:

$ tree /mnt/solrhome/mnt
mnt
`-- solrhome
`-- cores
|-- core0
|   `-- data
|   `-- index
|   |-- segments.gen
|   `-- segments_1
|-- core1
|   `-- data
|   `-- index
|   |-- segments.gen
|   `-- segments_1


So it looks like Solr decides to create the index dir and the full path ot it 
there.  It looks almost like Solr is looking at my instanceDirs in solr.xml and 
decides that it needs to create those directories, but under Solr home dir (I 
use -Dsolr.solr.home=/mnt/solrhome).

I switched back to the old solr.war and this stopped happening.
Is this a bug or a new feature that I missed?

Thank you,
Otis


Re: support for Payload Feature of lucene in solr

2009-07-14 Thread Grant Ingersoll
The TokenFilterFactory side is trivial for the  
DelimitedPayloadTokenFilter.  That could be in for 1.4.  In fact,  
there is an automated way to generate the stubs that should be run in  
preparing for a release.  I'll see if I can find a minute or two to  
make that happen.


For query support, I've never hooked into the query parser, so I have  
no clue.  Yonik seems to crank out new query capabilities pretty fast,  
so maybe it isn't too bad, even if it isn't done as fast as Yonik.   
Bigger picture, it would be great to have spans support too.



On Jul 14, 2009, at 3:44 PM, Shalin Shekhar Mangar wrote:

It may be nice to tell us why you need payloads? There may be other  
ways of
solving your problem than adding payload support to Solr? Anyway, I  
don't

see payload support before 1.5

On Tue, Jul 14, 2009 at 10:07 PM, Sumit Aggarwal
wrote:


Hey Nobel,
Any comments on Grant suggestion.

Thanks,
-Sumit

On Tue, Jul 14, 2009 at 8:40 PM, Sumit Aggarwal
wrote:


Hi Walter,
I do have a search server where i have implemented things using  
payload

feature itself. These days i am evaluating solr to get rid of my own

search

server. For that i need payloads feature in solr itself. I raised a

related

question and got a message from *Grant* as
* "**I added a new DelimitedPayloadTokenFilter to Lucene that  
should make
it really easy to send in payloads "inline" in Solr XML, so what  
remains

to

be done, I think is:*
*1. Create a new TokenFilterFactory for the TokenFilter
**2. Hook in some syntax support for creating a BoostingTermQuery  
in the

Query Parsers.**"*

Now can any one provide any custom code to do what grant mentioned.

Thanks,
Sumit

On Tue, Jul 14, 2009 at 8:24 PM, Walter Underwood <

wunderw...@netflix.com>wrote:


That doesn't require payloads. I was doing that with Solr 1.1.  
Define

two
fields, stemmed and exact, with different analyzer chains. Use  
copyfield

to
load the same info into both. With the dismax handler, search both

fields

with a higher boost on the exact field.

wunder

On 7/14/09 7:39 AM, "Toby Cole"  wrote:


We're looking into using payloads to improve relevance for stemmed
terms, using the payload to store the unstemmed term, boosting the
term if there's an exact match with the payloads.











--
Regards,
Shalin Shekhar Mangar.


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Wikipedia or reuters like index for testing facets?

2009-07-14 Thread Grant Ingersoll
At a min, it is trivial to use the EnWikiDocMaker and then send the  
doc over SolrJ...


On Jul 14, 2009, at 4:07 PM, Mark Miller wrote:


On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:

Is there a standard index like what Lucene uses for contrib/ 
benchmark for
executing faceted queries over? Or maybe we can randomly generate  
one that
works in conjunction with wikipedia? That way we can execute real  
world
queries against faceted data. Or we could use the Lucene/Solr  
mailing lists

and other data (ala Lucid's faceted site) as a standard index?



I don't think there is any standard set of docs for solr testing -  
there is
not a real benchmark contrib - though I know more than a few of us  
have
hacked up pieces of Lucene benchmark to work with Solr - I think  
I've done

it twice now ;)

Would be nice to get things going. I was thinking the other day: I  
wonder
how hard it would be to make Lucene Benchmark generic enough to  
accept Solr

impls and Solr algs?

It does a lot that would suck to duplicate.

--
--
- Mark

http://www.lucidimagination.com


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Wikipedia or reuters like index for testing facets?

2009-07-14 Thread Jason Rutherglen
You think enwiki has enough data for faceting?

On Tue, Jul 14, 2009 at 2:56 PM, Grant Ingersoll wrote:
> At a min, it is trivial to use the EnWikiDocMaker and then send the doc over
> SolrJ...
>
> On Jul 14, 2009, at 4:07 PM, Mark Miller wrote:
>
>> On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen <
>> jason.rutherg...@gmail.com> wrote:
>>
>>> Is there a standard index like what Lucene uses for contrib/benchmark for
>>> executing faceted queries over? Or maybe we can randomly generate one
>>> that
>>> works in conjunction with wikipedia? That way we can execute real world
>>> queries against faceted data. Or we could use the Lucene/Solr mailing
>>> lists
>>> and other data (ala Lucid's faceted site) as a standard index?
>>>
>>
>> I don't think there is any standard set of docs for solr testing - there
>> is
>> not a real benchmark contrib - though I know more than a few of us have
>> hacked up pieces of Lucene benchmark to work with Solr - I think I've done
>> it twice now ;)
>>
>> Would be nice to get things going. I was thinking the other day: I wonder
>> how hard it would be to make Lucene Benchmark generic enough to accept
>> Solr
>> impls and Solr algs?
>>
>> It does a lot that would suck to duplicate.
>>
>> --
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


JMX monitoring for multiple SOLR instances

2009-07-14 Thread J G

Hi,

If I want to run multiple SOLR war files in tomcat is it possible to monitor 
each of the SOLR instances individually through JMX? Has anyone attempted this 
before? Also, what are the implications (e.g. performance) of runnign mulitple 
SOLR instances in the same tomcat server?

Thanks.




_
Windows Live™: Keep your life in sync. 
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_BR_life_in_synch_062009

Re: Multicore Solr (trunk) creates extra dirs

2009-07-14 Thread Otis Gospodnetic

Hi,

Paul and Shalin will know about this.  What I'm seeing looks a lot like what 
Walter reported in March:
* http://markmail.org/thread/dfsj7hqi5buzhd6n

And this commit from Paul seems possibly related:
* http://markmail.org/message/cjvjffrfszlku3ri

...because of things like:
-cores = new CoreContainer(new SolrResourceLoader(instanceDir));
+cores = new CoreContainer(new SolrResourceLoader(solrHome));
...
if (!idir.isAbsolute()) {
-  idir = new File(loader.getInstanceDir(), dcore.getInstanceDir());
+  idir = new File(solrHome, dcore.getInstanceDir());
...

I don't have dataDir in my solr.xml, only absolute paths to my cores.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Otis Gospodnetic 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, July 14, 2009 5:49:19 PM
> Subject: Multicore Solr (trunk) creates extra dirs
> 
> 
> Hello,
> 
> I just built solr.war from trunk and deployed it to a multicore solr server 
> whose solr.xml looks like this:
> 
> 
>   
> 
> 
>   
> 
> 
> Each core has conf and data/index dirs under its instanceDir.
> e.g.
> 
> $ tree /mnt/solrhome/cores/core0
> cores/core0
> |-- conf
> |   |-- schema.xml -> ../../../conf/schema-foo.xml
> |   `-- solrconfig.xml -> ../../../conf/solrconfig-foo.xml
> `-- data
> `-- index
> 
> I noticed that when I start the container with this brand new Solr all of a 
> sudden the '/mnt' directory shows up in /mnt/solrhome !
> (this /mnt/solrhome is also the directory from which I happened to start the 
> container, though I'm not sure if that matters).
> 
> This is what this /mnt/solrhome/mnt looks like:
> 
> $ tree /mnt/solrhome/mnt
> mnt
> `-- solrhome
> `-- cores
> |-- core0
> |   `-- data
> |   `-- index
> |   |-- segments.gen
> |   `-- segments_1
> |-- core1
> |   `-- data
> |   `-- index
> |   |-- segments.gen
> |   `-- segments_1
> 
> 
> So it looks like Solr decides to create the index dir and the full path ot it 
> there.  It looks almost like Solr is looking at my instanceDirs in solr.xml 
> and 
> decides that it needs to create those directories, but under Solr home dir (I 
> use -Dsolr.solr.home=/mnt/solrhome).
> 
> I switched back to the old solr.war and this stopped happening.
> Is this a bug or a new feature that I missed?
> 
> Thank you,
> Otis



Re: Using Multiple fields in UniqueKey

2009-07-14 Thread Otis Gospodnetic

Some ideas:

- Use copyField to copy fields to the field designated as the uniqueKey (not 
sure if this will work)
- Create the field from existing data before sending docs to Solr
- Create a custom UpdateRequestProcessor that adds a field for each document it 
processes and stuffs it with other fields' values
- Try http://wiki.apache.org/solr/Deduplication

I'd be curious to know which of these you will choose.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Anand Kumar Prabhakar 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, July 14, 2009 5:13:47 AM
> Subject: Using Multiple fields in UniqueKey
> 
> 
> Is there any possiblity of Adding Multiple fields to the UniqueKey in
> Schema.xml(An Implementation similar to Compound Primary Key)? 
> 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Using-Multiple-fields-in-UniqueKey-tp24476088p24476088.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 1.4 Release Date

2009-07-14 Thread Otis Gospodnetic

I just looked at SOLR JIRA today and saw some 40 open issues marked for 1.4, 
so 

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: pof 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, July 14, 2009 12:37:33 AM
> Subject: Re: Solr 1.4 Release Date
> 
> 
> Any updates on this?
> 
> Cheers.
> 
> Gurjot Singh wrote:
> > 
> > Hi, I am curious to know when is the scheduled/tentative release date of
> > Solr 1.4.
> > 
> > Thanks,
> > Gurjot
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Solr-1.4-Release-Date-tp23260381p24473570.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Wikipedia or reuters like index for testing facets?

2009-07-14 Thread Grant Ingersoll
Probably not as generated by the EnwikiDocMaker, but the  
WikipediaTokenizer in Lucene can pull out richer syntax which could  
then be Teed/Sinked to other fields.  Things like categories, related  
links, etc.  Mostly, though, I was just commenting on the fact that it  
isn't hard to at least use it for getting docs into Solr.


-Grant
On Jul 14, 2009, at 7:38 PM, Jason Rutherglen wrote:


You think enwiki has enough data for faceting?

On Tue, Jul 14, 2009 at 2:56 PM, Grant  
Ingersoll wrote:
At a min, it is trivial to use the EnWikiDocMaker and then send the  
doc over

SolrJ...

On Jul 14, 2009, at 4:07 PM, Mark Miller wrote:


On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:

Is there a standard index like what Lucene uses for contrib/ 
benchmark for
executing faceted queries over? Or maybe we can randomly generate  
one

that
works in conjunction with wikipedia? That way we can execute real  
world
queries against faceted data. Or we could use the Lucene/Solr  
mailing

lists
and other data (ala Lucid's faceted site) as a standard index?



I don't think there is any standard set of docs for solr testing -  
there

is
not a real benchmark contrib - though I know more than a few of us  
have
hacked up pieces of Lucene benchmark to work with Solr - I think  
I've done

it twice now ;)

Would be nice to get things going. I was thinking the other day: I  
wonder
how hard it would be to make Lucene Benchmark generic enough to  
accept

Solr
impls and Solr algs?

It does a lot that would suck to duplicate.

--
--
- Mark

http://www.lucidimagination.com


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using

Solr/Lucene:
http://www.lucidimagination.com/search




--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Wikipedia or reuters like index for testing facets?

2009-07-14 Thread Mark Miller
Why don't you just randomly generate the facet data? Thats prob the best way
right? You can control the uniques and ranges.

On Wed, Jul 15, 2009 at 1:21 AM, Grant Ingersoll wrote:

> Probably not as generated by the EnwikiDocMaker, but the WikipediaTokenizer
> in Lucene can pull out richer syntax which could then be Teed/Sinked to
> other fields.  Things like categories, related links, etc.  Mostly, though,
> I was just commenting on the fact that it isn't hard to at least use it for
> getting docs into Solr.
>
> -Grant
>
> On Jul 14, 2009, at 7:38 PM, Jason Rutherglen wrote:
>
>  You think enwiki has enough data for faceting?
>>
>> On Tue, Jul 14, 2009 at 2:56 PM, Grant Ingersoll
>> wrote:
>>
>>> At a min, it is trivial to use the EnWikiDocMaker and then send the doc
>>> over
>>> SolrJ...
>>>
>>> On Jul 14, 2009, at 4:07 PM, Mark Miller wrote:
>>>
>>>  On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen <
 jason.rutherg...@gmail.com> wrote:

  Is there a standard index like what Lucene uses for contrib/benchmark
> for
> executing faceted queries over? Or maybe we can randomly generate one
> that
> works in conjunction with wikipedia? That way we can execute real world
> queries against faceted data. Or we could use the Lucene/Solr mailing
> lists
> and other data (ala Lucid's faceted site) as a standard index?
>
>
 I don't think there is any standard set of docs for solr testing - there
 is
 not a real benchmark contrib - though I know more than a few of us have
 hacked up pieces of Lucene benchmark to work with Solr - I think I've
 done
 it twice now ;)

 Would be nice to get things going. I was thinking the other day: I
 wonder
 how hard it would be to make Lucene Benchmark generic enough to accept
 Solr
 impls and Solr algs?

 It does a lot that would suck to duplicate.

 --
 --
 - Mark

 http://www.lucidimagination.com

>>>
>>> --
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


-- 
-- 
- Mark

http://www.lucidimagination.com


Re: support for Payload Feature of lucene in solr

2009-07-14 Thread Sumit Aggarwal
Hi Shalin,
Our requirement is to have a rolling window support for popularity of
catalog items for say 3 months. What we used to do we are adding term,value
as tokens where term is some unique string for each day and value
is popularity count for that day. Once indexing this data as token stream
and while building query search we build the whole query constructing term
for each day or 7 days ..etc then we are extending SpanTermQuery where
we used to do summing of payloads value for that duration (we have
extended SpanScorer also to do this task) We got the result based this new
score. Is it possible to do in solr somehow?

Thanks,
Sumit

On Wed, Jul 15, 2009 at 3:25 AM, Grant Ingersoll wrote:

> The TokenFilterFactory side is trivial for the DelimitedPayloadTokenFilter.
>  That could be in for 1.4.  In fact, there is an automated way to generate
> the stubs that should be run in preparing for a release.  I'll see if I can
> find a minute or two to make that happen.
>
> For query support, I've never hooked into the query parser, so I have no
> clue.  Yonik seems to crank out new query capabilities pretty fast, so maybe
> it isn't too bad, even if it isn't done as fast as Yonik.  Bigger picture,
> it would be great to have spans support too.
>
>
>
> On Jul 14, 2009, at 3:44 PM, Shalin Shekhar Mangar wrote:
>
>  It may be nice to tell us why you need payloads? There may be other ways
>> of
>> solving your problem than adding payload support to Solr? Anyway, I don't
>> see payload support before 1.5
>>
>> On Tue, Jul 14, 2009 at 10:07 PM, Sumit Aggarwal
>> wrote:
>>
>>  Hey Nobel,
>>> Any comments on Grant suggestion.
>>>
>>> Thanks,
>>> -Sumit
>>>
>>> On Tue, Jul 14, 2009 at 8:40 PM, Sumit Aggarwal
>>> wrote:
>>>
>>>  Hi Walter,
 I do have a search server where i have implemented things using payload
 feature itself. These days i am evaluating solr to get rid of my own

>>> search
>>>
 server. For that i need payloads feature in solr itself. I raised a

>>> related
>>>
 question and got a message from *Grant* as
 * "**I added a new DelimitedPayloadTokenFilter to Lucene that should
 make
 it really easy to send in payloads "inline" in Solr XML, so what remains

>>> to
>>>
 be done, I think is:*
 *1. Create a new TokenFilterFactory for the TokenFilter
 **2. Hook in some syntax support for creating a BoostingTermQuery in the
 Query Parsers.**"*

 Now can any one provide any custom code to do what grant mentioned.

 Thanks,
 Sumit

 On Tue, Jul 14, 2009 at 8:24 PM, Walter Underwood <

>>> wunderw...@netflix.com>wrote:
>>>

  That doesn't require payloads. I was doing that with Solr 1.1. Define
>
 two
>>>
 fields, stemmed and exact, with different analyzer chains. Use copyfield
> to
> load the same info into both. With the dismax handler, search both
>
 fields
>>>
 with a higher boost on the exact field.
>
> wunder
>
> On 7/14/09 7:39 AM, "Toby Cole"  wrote:
>
>  We're looking into using payloads to improve relevance for stemmed
>> terms, using the payload to store the unstemmed term, boosting the
>> term if there's an exact match with the payloads.
>>
>
>
>

>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


Re: grouping and sorting by facet?

2009-07-14 Thread Chris Hostetter

: Is there a way to group and sort by facet count?  I have a large set of
: images, each of which is part of a different "collection."  I am performing
: a faceted search:
: 
: 
/solr/select/?q=my+term&max=30&version=2.2&rows=30&start=0&facet=true&facet.field=collection&facet.sort=true
: 
: I would like to group the results by collection count.
: 
: So all of the images in the collection with the most image "hits" comes
: first.
: 
: Not sure how to do that

there isn't really any way to do that.

you could make two requests: in the first get the facet counts, and then 
in the second augment the query to boost documents that match the various 
facet field values (where the boost is determined by the count)

bear in mind: this probably isn't going to be as useful of a user 
experience as it sounds.  This will cause the results that have a lot in 
common with the other results to appear near the top -- but users rarely 
need UI assistance for finding the "common" stuff, the fact that it's 
common means it's lready pretty prevelant.  

There's also no reason to think that docs matching the facet value with 
the highest count are more relevant to the original query.  If i'm 
searching products for "apple ipod" the category with the highest facet 
count is probably going to be something like "accessories" or "cases" and 
not "mp3 players" because there are a lot more chargers, cases, and 
headphones in the world that show up when you search for ipod then there 
are mp3 players -- that doesn't mean those accessory products should 
appear first in the list of matches.





-Hoss



Segments_2 and segments.gen under Index folder and spellchecker1, spellchecker2, spellcheckerFile folder

2009-07-14 Thread Francis Yakin

I just upgraded our solr to 1.3.0

After I deployed the solr apps, I noticed there are:

Segments_2 and segments.gen and there are 3 folder spellchecker1, spellchecker2 
and spellcheckerFile

What's these for? When I deleted them, I need bounce the apps again and it will 
generate the new ones again.

Thanks

Francis



Re: Segments_2 and segments.gen under Index folder and spellchecker1, spellchecker2, spellcheckerFile folder

2009-07-14 Thread Shalin Shekhar Mangar
On Wed, Jul 15, 2009 at 8:46 AM, Francis Yakin  wrote:

>
> I just upgraded our solr to 1.3.0
>
> After I deployed the solr apps, I noticed there are:
>
> Segments_2 and segments.gen and there are 3 folder spellchecker1,
> spellchecker2 and spellcheckerFile
>
> What's these for? When I deleted them, I need bounce the apps again and it
> will generate the new ones again.
>


segments.gen used to be created by older versions of Lucene. Since Solr 1.3,
a file named segments_N (N=1,2,3...) will be created. Both exist because the
new Solr version is pointing to an index created by the earlier Solr
version. There's no harm in keeping it as-is, however if you want, you can
clean the index directory and re-index all documents to get rid of the
segments.gen file.

The spellchecker directories are created by the SpellCheckComponent. You can
comment out all the sections related to SpellCheckComponent from your
solrconfig.xml and delete these directories.

-- 
Regards,
Shalin Shekhar Mangar.


DefaultSearchField ? "important"

2009-07-14 Thread Jörg Agatz
Hallo Users...
And good Morning, in germany it is morning :-)

I have a realy important Prroblem...

My Fields are realy Bad.. Like
"CUPS_EBENE1_EBENE2_TASKS_CATEGORIE"

I have no Content field ore somthing like this...
So when i will search somthing, i need to search in ALL fields, but when i
search "*:test" it dosent Work,
And when i put "*" in the defaultSearchField" it dosent Work too

How i can Search in ALL fields?


spellcheck with misspelled words in index

2009-07-14 Thread Chris Williams
Hi,
I'm having some trouble getting the correct results from the
spellcheck component.  I'd like to use it to suggest correct product
titles on our site, however some of our products have misspellings in
them outside of our control.  For example, there's 2 products with the
misspelled word "cusine" (and 25k with the correct spelling
"cuisine").  So if someone searches for the word "cusine" on our site,
I would like to show the 2 misspelled products, and a suggestion with
"Did you mean cuisine?".

However, I can't seem to ever get any spelling suggestions when I
search by the word "cusine", and correctlySpelled is always true.
Misspelled words that don't appear in the index work fine.

I noticed that setting onlyMorePopular to true will return suggestions
for the misspelled word, but I've found that it doesn't work great for
other words and produces suggestions too often for correctly spelled
words.

I incorrectly had thought that by setting thresholdTokenFrequency
higher on my spelling dictionary that these misspellings would not
appear in my spelling index and thus I would get suggestions for them,
but as I see now, the spellcheck doesn't quite work like that.

Is there any way to somehow get spelling suggestions to work for these
misspellings in my index if they have a low frequency?

Thanks in advance,
Chris