Text search within facets?

2010-02-12 Thread chasiubao

Hello,

Is it possible to do a text search within facets?  Something that will
return me what words solr used to gather my results and how many of those
results were found.

For example, if I have the following field:



and it has docs that contain something like

english bulldog
french bulldog
bichon frise

If I search for "english bulldog" and facet on "dog", I will get the
following:

135
23
12

But I really want only the ones that contain the words "english" and
"bulldog" like 

135
23

Thanks for your help!
-- 
View this message in context: 
http://old.nabble.com/Text-search-within-facets--tp27560090p27560090.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to reindex data without restarting server

2010-02-12 Thread Emad Mushtaq
Hi,

Thanks ! This is very useful :) :)

On Fri, Feb 12, 2010 at 7:55 AM, Joe Calderon wrote:

> if you use the core model via solr.xml you can reload a core without having
> to to restart the servlet container,
> http://wiki.apache.org/solr/CoreAdmin
>
> On 02/11/2010 02:40 PM, Emad Mushtaq wrote:
>
>> Hi,
>>
>> I would like to know if there is a way of reindexing data without
>> restarting
>> the server. Lets say I make a change in the schema file. That would
>> require
>> me to reindex data. Is there a solution to this ?
>>
>>
>>
>
>


-- 
Muhammad Emad Mushtaq
http://www.emadmushtaq.com/


EmbeddedSolrServer vs CommonsHttpSolrServer

2010-02-12 Thread dcdmailbox-info
Hi all,

I am new to solr/solrj.

I correctly started up the server example given in the distribution 
(apache-solr-1.4.0\example\solr), populated the index with test data set, and 
successfully tested with http query string via browser (es. 
http://localhost:8983/solr/select/?indent=on&q=video&fl=name,id)

I am trying to set up solrj clients using both CommonsHttpSolrServer and 
EmbeddedSolrServer.

My examples are with single core configuration.

Here below the method used for CommonsHttpSolrServer initialization:

[code.1]
public SolrServer getCommonsHttpSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException {
String url = "http://localhost:8983/solr";;
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url);
server.setSoTimeout(1000); // socket read timeout
server.setConnectionTimeout(100);
server.setDefaultMaxConnectionsPerHost(100);
server.setMaxTotalConnections(100);
server.setFollowRedirects(false); // defaults to false
// allowCompression defaults to false.
// Server side must support gzip or deflate for this to have any effect.
server.setAllowCompression(true);
server.setMaxRetries(1); // defaults to 0. > 1 not recommended.
return server;
}

Here below the method used for EmbeddedSolrServer initialization (provided in 
the wiki section):

[code.2]
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException {
System.setProperty("solr.solr.home", 
"/WORKSPACE/bin/apache-solr-1.4.0/example/solr");
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, "");
return server;
}

Here below the common code used to query the server: 
[code.3]

SolrServer server = mintIdxMain.getEmbeddedSolrServer();
//SolrServer server = mintIdxMain.getCommonsHttpSolrServer();

SolrQuery query = new SolrQuery("video");
QueryResponse rsp = server.query(query);
SolrDocumentList docs = rsp.getResults();

System.out.println("Found: " + docs.getNumFound());
System.out.println("Start: " + docs.getStart());
System.out.println("Max Score: " + docs.getMaxScore());

 
CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives 
always no results.
What's wrong with the initialization and/or the configuration of the 
EmbeddedSolrServer?
CoreContainer.Initializer() seems to not recognize the single core from 
solrconfig.xml...

If I modify [code.2] with the following code, it seems to work. 
I manually added only explicit Core Container registration. 
Is [code.4] the correct way?

[code.4]
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException {
System.setProperty("solr.solr.home", 
"/WORKSPACE/bin/apache-solr-1.4.0/example/solr");

CoreContainer.Initializer initializer = new CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();

/* > */
SolrConfig solrConfig = new 
SolrConfig("/WORKSPACE/bin/apache-solr-1.4.0/example/solr", "solrconfig.xml", 
null);
IndexSchema indexSchema = new IndexSchema(solrConfig, "schema.xml", 
null);
CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, "", 
solrConfig.getResourceLoader().getInstanceDir());
SolrCore core = new SolrCore(null, 
"/WORKSPACE/bin/apache-solr-1.4.0/example/solr/data", solrConfig, indexSchema, 
coreDescriptor);
coreContainer.register("", core, false);
/* < */

EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, "");
return server;
}

Many thanks in advance for the support and the great work realized with all the 
lucene/solr projects.

Dino.
--


  

inconsistency between analysis.jsp and actual search

2010-02-12 Thread Lukas Kahwe Smith
Hi

I am indexing the name "FC St. Gallen" using the following type:

  






  
  





  


Which according to analysis.jsp gets split into:
f | fc | s | st | g | ga | gal | gall | galle | gallen

So far so good.

Now if I search for "fc st.gallen" according to analysis.jsp it will search for:
fc | st | gallen

But when I do a dismax search using the following handler:
  

 dismax
 explicit
 10
 name firstname email^0.5 telefon^0.5 city^0.6 
street^0.6
 id,type,name,firstname,zipcode,city,street,urlizedname

  

I do not get a match.
Looking at the debug of the query I can see that its actually splitting the 
query into "fc" and "st gallen":
fc st.gallen
fc st.gallen

+((DisjunctionMaxQuery((telefon:fc^0.5 | firstname:fc | email:fc^0.5 | 
street:fc^0.6 | city:fc^0.6 | name:fc)) DisjunctionMaxQuery((telefon:"st 
gallen"^0.5 | firstname:"st gallen" | email:"st gallen"^0.5 | street:"st 
gallen"^0.6 | city:"st gallen"^0.6 | name:"st gallen")))~2) ()


+(((telefon:fc^0.5 | firstname:fc | email:fc^0.5 | street:fc^0.6 | city:fc^0.6 
| name:fc) (telefon:"st gallen"^0.5 | firstname:"st gallen" | email:"st 
gallen"^0.5 | street:"st gallen"^0.6 | city:"st gallen"^0.6 | name:"st 
gallen"))~2) ()


Whats going on there?

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





Re: inconsistency between analysis.jsp and actual search

2010-02-12 Thread Ahmet Arslan
> Which according to analysis.jsp gets split into:
> f | fc | s | st | g | ga | gal | gall | galle | gallen
> 
> So far so good.
> 
> Now if I search for "fc st.gallen" according to
> analysis.jsp it will search for:
> fc | st | gallen
> 
> But when I do a dismax search using the following handler:
>    class="solr.SearchHandler" default="true">
>     
>       name="defType">dismax
>       name="echoParams">explicit
>       name="rows">10
>      name
> firstname email^0.5 telefon^0.5 city^0.6
> street^0.6
>       name="fl">id,type,name,firstname,zipcode,city,street,urlizedname
>     
>   
> 
> I do not get a match.
> Looking at the debug of the query I can see that its
> actually splitting the query into "fc" and "st gallen":
> fc st.gallen
> fc st.gallen
> 
> +((DisjunctionMaxQuery((telefon:fc^0.5 | firstname:fc |
> email:fc^0.5 | street:fc^0.6 | city:fc^0.6 | name:fc))
> DisjunctionMaxQuery((telefon:"st gallen"^0.5 | firstname:"st
> gallen" | email:"st gallen"^0.5 | street:"st gallen"^0.6 |
> city:"st gallen"^0.6 | name:"st gallen")))~2) ()
> 
> 
> +(((telefon:fc^0.5 | firstname:fc | email:fc^0.5 |
> street:fc^0.6 | city:fc^0.6 | name:fc) (telefon:"st
> gallen"^0.5 | firstname:"st gallen" | email:"st gallen"^0.5
> | street:"st gallen"^0.6 | city:"st gallen"^0.6 | name:"st
> gallen"))~2) ()
> 
> 
> Whats going on there?

analysis.jsp does not do actual query parsing. just shows produced tokens step 
by step in analysis (charfilter, tokenizer, tokenfilter) phase.
"admin/analysis.jsp page will show you how your field is processed while 
indexing and while querying, and if a particular query matches." [1]

[1]http://wiki.apache.org/solr/FAQ#My_search_returns_too_many_.2BAC8_too_little_.2BAC8_unexpected_results.2C_how_to_debug.3F





Re: EmbeddedSolrServer vs CommonsHttpSolrServer

2010-02-12 Thread Ron Chan
I suspect this has something to do with the dataDir setting in the example 's 
solrconfig.xml 

${solr.data.dir:./solr/data} 

we use the example's solrconfig.xml as the base for our deployments and always 
comment this out 

the default of having conf and data sitting under the solr home works well 


- Original Message - 
From: dcdmailbox-i...@yahoo.it 
To: solr-user@lucene.apache.org 
Sent: Friday, 12 February, 2010 8:30:57 AM 
Subject: EmbeddedSolrServer vs CommonsHttpSolrServer 

Hi all, 

I am new to solr/solrj. 

I correctly started up the server example given in the distribution 
(apache-solr-1.4.0\example\solr), populated the index with test data set, and 
successfully tested with http query string via browser (es. 
http://localhost:8983/solr/select/?indent=on&q=video&fl=name,id) 

I am trying to set up solrj clients using both CommonsHttpSolrServer and 
EmbeddedSolrServer. 

My examples are with single core configuration. 

Here below the method used for CommonsHttpSolrServer initialization: 

[code.1] 
public SolrServer getCommonsHttpSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
String url = "http://localhost:8983/solr";; 
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); 
server.setSoTimeout(1000); // socket read timeout 
server.setConnectionTimeout(100); 
server.setDefaultMaxConnectionsPerHost(100); 
server.setMaxTotalConnections(100); 
server.setFollowRedirects(false); // defaults to false 
// allowCompression defaults to false. 
// Server side must support gzip or deflate for this to have any effect. 
server.setAllowCompression(true); 
server.setMaxRetries(1); // defaults to 0. > 1 not recommended. 
return server; 
} 

Here below the method used for EmbeddedSolrServer initialization (provided in 
the wiki section): 

[code.2] 
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
System.setProperty("solr.solr.home", 
"/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); 
CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 
CoreContainer coreContainer = initializer.initialize(); 
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); 
return server; 
} 

Here below the common code used to query the server: 
[code.3] 

SolrServer server = mintIdxMain.getEmbeddedSolrServer(); 
//SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); 

SolrQuery query = new SolrQuery("video"); 
QueryResponse rsp = server.query(query); 
SolrDocumentList docs = rsp.getResults(); 

System.out.println("Found : " + docs.getNumFound()); 
System.out.println("Start : " + docs.getStart()); 
System.out.println("Max Score: " + docs.getMaxScore()); 


CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives 
always no results. 
What's wrong with the initialization and/or the configuration of the 
EmbeddedSolrServer? 
CoreContainer.Initializer() seems to not recognize the single core from 
solrconfig.xml... 

If I modify [code.2] with the following code, it seems to work. 
I manually added only explicit Core Container registration. 
Is [code.4] the correct way? 

[code.4] 
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
System.setProperty("solr.solr.home", 
"/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); 

CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 
CoreContainer coreContainer = initializer.initialize(); 

/* > */ 
SolrConfig solrConfig = new 
SolrConfig("/WORKSPACE/bin/apache-solr-1.4.0/example/solr", "solrconfig.xml", 
null); 
IndexSchema indexSchema = new IndexSchema(solrConfig, "schema.xml", null); 
CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, "", 
solrConfig.getResourceLoader().getInstanceDir()); 
SolrCore core = new SolrCore(null, 
"/WORKSPACE/bin/apache-solr-1.4.0/example/solr/data", solrConfig, indexSchema, 
coreDescriptor); 
coreContainer.register("", core, false); 
/* < */ 

EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); 
return server; 
} 

Many thanks in advance for the support and the great work realized with all the 
lucene/solr projects. 

Dino. 
-- 




Local Solr Inconsistent results for radius

2010-02-12 Thread Emad Mushtaq
Hello,

I have a question related to local solr. For certain locations (latitude,
longitude), the spatial search does not work. Here is the query I try to
make which gives me no results:

q=*&qt=geo&sort=geo_distance asc&lat=33.718151&long=73.
060547&radius=450

However if I make the same query with radius=449, it gives me results.

Here is part of my solrconfig.xml containing startTier and endTier:


 
latitude 
longitude 

9
17
   
   
   
   

What do I need to do to fix this problem?


-- 
Muhammad Emad Mushtaq
http://www.emadmushtaq.com/


Re: inconsistency between analysis.jsp and actual search

2010-02-12 Thread Lukas Kahwe Smith

On 12.02.2010, at 11:17, Ahmet Arslan wrote:
> analysis.jsp does not do actual query parsing. just shows produced tokens 
> step by step in analysis (charfilter, tokenizer, tokenfilter) phase.
> "admin/analysis.jsp page will show you how your field is processed while 
> indexing and while querying, and if a particular query matches." [1]
> 
> [1]http://wiki.apache.org/solr/FAQ#My_search_returns_too_many_.2BAC8_too_little_.2BAC8_unexpected_results.2C_how_to_debug.3F


I see, thats good to know. Maybe even something that should be noted in the 
analysis.jsp page itself.

Anyways so how can I get "st.gallen" split into two terms at query time?


  
...
  
  





  


It seems I should probably use the solr.StandardTokenizerFactory anyways, but 
for this case it wouldnt help either.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





optimize is taking too much time

2010-02-12 Thread mklprasad

hi 
in my solr u have 1,42,45,223 records having some 50GB .
Now when iam loading a new record and when its trying optimize the docs its
taking 2 much memory and time 


can any body please tell do we have any property in solr to get rid of this.

Thanks in advance

-- 
View this message in context: 
http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: EmbeddedSolrServer vs CommonsHttpSolrServer

2010-02-12 Thread dcdmailbox-info
Yes you are right.
[code.2] works fine by commenting out the following lines on solrconfig.xml 





Is it correct this different behaviour  from EmbeddedSolrServer ?
Or it can be considered a low priority bug?
Thanks for you prompt reply!
Dino.
--





Da: Ron Chan 
A: solr-user@lucene.apache.org
Inviato: Ven 12 febbraio 2010, 11:14:58
Oggetto: Re: EmbeddedSolrServer  vs CommonsHttpSolrServer

I suspect this has something to do with the dataDir setting in the example 's 
solrconfig.xml 

${solr.data.dir:./solr/data} 

we use the example's solrconfig.xml as the base for our deployments and always 
comment this out 

the default of having conf and data sitting under the solr home works well 


- Original Message - 
From: dcdmailbox-i...@yahoo.it 
To: solr-user@lucene.apache.org 
Sent: Friday, 12 February, 2010 8:30:57 AM 
Subject: EmbeddedSolrServer vs CommonsHttpSolrServer 

Hi all, 

I am new to solr/solrj. 

I correctly started up the server example given in the distribution 
(apache-solr-1.4.0\example\solr), populated the index with test data set, and 
successfully tested with http query string via browser (es. 
http://localhost:8983/solr/select/?indent=on&q=video&fl=name,id) 

I am trying to set up solrj clients using both CommonsHttpSolrServer and 
EmbeddedSolrServer. 

My examples are with single core configuration. 

Here below the method used for CommonsHttpSolrServer initialization: 

[code.1] 
public SolrServer getCommonsHttpSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
String url = "http://localhost:8983/solr";; 
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); 
server.setSoTimeout(1000); // socket read timeout 
server.setConnectionTimeout(100); 
server.setDefaultMaxConnectionsPerHost(100); 
server.setMaxTotalConnections(100); 
server.setFollowRedirects(false); // defaults to false 
// allowCompression defaults to false. 
// Server side must support gzip or deflate for this to have any effect. 
server.setAllowCompression(true); 
server.setMaxRetries(1); // defaults to 0. > 1 not recommended. 
return server; 
} 

Here below the method used for EmbeddedSolrServer initialization (provided in 
the wiki section): 

[code.2] 
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
System.setProperty("solr.solr.home", 
"/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); 
CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 
CoreContainer coreContainer = initializer.initialize(); 
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); 
return server; 
} 

Here below the common code used to query the server: 
[code.3] 

SolrServer server = mintIdxMain.getEmbeddedSolrServer(); 
//SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); 

SolrQuery query = new SolrQuery("video"); 
QueryResponse rsp = server.query(query); 
SolrDocumentList docs = rsp.getResults(); 

System.out.println("Found : " + docs.getNumFound()); 
System.out.println("Start : " + docs.getStart()); 
System.out.println("Max Score: " + docs.getMaxScore()); 


CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives 
always no results. 
What's wrong with the initialization and/or the configuration of the 
EmbeddedSolrServer? 
CoreContainer.Initializer() seems to not recognize the single core from 
solrconfig.xml... 

If I modify [code.2] with the following code, it seems to work. 
I manually added only explicit Core Container registration. 
Is [code.4] the correct way? 

[code.4] 
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
System.setProperty("solr.solr.home", 
"/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); 

CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 
CoreContainer coreContainer = initializer.initialize(); 

/* > */ 
SolrConfig solrConfig = new 
SolrConfig("/WORKSPACE/bin/apache-solr-1.4.0/example/solr", "solrconfig.xml", 
null); 
IndexSchema indexSchema = new IndexSchema(solrConfig, "schema.xml", null); 
CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, "", 
solrConfig.getResourceLoader().getInstanceDir()); 
SolrCore core = new SolrCore(null, 
"/WORKSPACE/bin/apache-solr-1.4.0/example/solr/data", solrConfig, indexSchema, 
coreDescriptor); 
coreContainer.register("", core, false); 
/* < */ 

EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); 
return server; 
} 

Many thanks in advance for the support and the great work realized with all the 
lucene/solr projects. 

Dino. 
-- 


  

Re: EmbeddedSolrServer vs CommonsHttpSolrServer

2010-02-12 Thread Erik Hatcher
When using EmbeddedSolrServer, you could simply set the solr.data.dir  
system property or launch your process from the same working directory  
where you are launching the HTTP version of Solr.  Either of those  
should also work to alleviate this issue.


Erik

On Feb 12, 2010, at 5:36 AM, dcdmailbox-i...@yahoo.it wrote:


Yes you are right.
[code.2] works fine by commenting out the following lines on  
solrconfig.xml






Is it correct this different behaviour  from EmbeddedSolrServer ?
Or it can be considered a low priority bug?
Thanks for you prompt reply!
Dino.
--





Da: Ron Chan 
A: solr-user@lucene.apache.org
Inviato: Ven 12 febbraio 2010, 11:14:58
Oggetto: Re: EmbeddedSolrServer  vs CommonsHttpSolrServer

I suspect this has something to do with the dataDir setting in the  
example 's solrconfig.xml


${solr.data.dir:./solr/data}

we use the example's solrconfig.xml as the base for our deployments  
and always comment this out


the default of having conf and data sitting under the solr home  
works well



- Original Message -
From: dcdmailbox-i...@yahoo.it
To: solr-user@lucene.apache.org
Sent: Friday, 12 February, 2010 8:30:57 AM
Subject: EmbeddedSolrServer vs CommonsHttpSolrServer

Hi all,

I am new to solr/solrj.

I correctly started up the server example given in the distribution  
(apache-solr-1.4.0\example\solr), populated the index with test data  
set, and successfully tested with http query string via browser (es. http://localhost:8983/solr/select/?indent=on&q=video&fl=name,id)


I am trying to set up solrj clients using both CommonsHttpSolrServer  
and EmbeddedSolrServer.


My examples are with single core configuration.

Here below the method used for CommonsHttpSolrServer initialization:

[code.1]
public SolrServer getCommonsHttpSolrServer() throws IOException,  
ParserConfigurationException, SAXException, SolrServerException {

String url = "http://localhost:8983/solr";;
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url);
server.setSoTimeout(1000); // socket read timeout
server.setConnectionTimeout(100);
server.setDefaultMaxConnectionsPerHost(100);
server.setMaxTotalConnections(100);
server.setFollowRedirects(false); // defaults to false
// allowCompression defaults to false.
// Server side must support gzip or deflate for this to have any  
effect.

server.setAllowCompression(true);
server.setMaxRetries(1); // defaults to 0. > 1 not recommended.
return server;
}

Here below the method used for EmbeddedSolrServer initialization  
(provided in the wiki section):


[code.2]
public SolrServer getEmbeddedSolrServer() throws IOException,  
ParserConfigurationException, SAXException, SolrServerException {
System.setProperty("solr.solr.home", "/WORKSPACE/bin/apache- 
solr-1.4.0/example/solr");
CoreContainer.Initializer initializer = new  
CoreContainer.Initializer();

CoreContainer coreContainer = initializer.initialize();
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, "");
return server;
}

Here below the common code used to query the server:
[code.3]

SolrServer server = mintIdxMain.getEmbeddedSolrServer();
//SolrServer server = mintIdxMain.getCommonsHttpSolrServer();

SolrQuery query = new SolrQuery("video");
QueryResponse rsp = server.query(query);
SolrDocumentList docs = rsp.getResults();

System.out.println("Found : " + docs.getNumFound());
System.out.println("Start : " + docs.getStart());
System.out.println("Max Score: " + docs.getMaxScore());


CommonsHttpSolrServer gives correct results whereas  
EmbeddedSolrServer gives always no results.

What's wrong with the initialization and/or the configuration of the
EmbeddedSolrServer?
CoreContainer.Initializer() seems to not recognize the single core  
from solrconfig.xml...


If I modify [code.2] with the following code, it seems to work.
I manually added only explicit Core Container registration.
Is [code.4] the correct way?

[code.4]
public SolrServer getEmbeddedSolrServer() throws IOException,  
ParserConfigurationException, SAXException, SolrServerException {
System.setProperty("solr.solr.home", "/WORKSPACE/bin/apache- 
solr-1.4.0/example/solr");


CoreContainer.Initializer initializer = new  
CoreContainer.Initializer();

CoreContainer coreContainer = initializer.initialize();

/* > */
SolrConfig solrConfig = new SolrConfig("/WORKSPACE/bin/apache- 
solr-1.4.0/example/solr", "solrconfig.xml", null);
IndexSchema indexSchema = new IndexSchema(solrConfig, "schema.xml",  
null);
CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer,  
"", solrConfig.getResourceLoader().getInstanceDir());
SolrCore core = new SolrCore(null, "/WORKSPACE/bin/apache-solr-1.4.0/ 
example/solr/data", solrConfig, indexSchema, coreDescriptor);

coreContainer.register("", core, false);
/* < */

EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, "");
return server;
}

Many thanks in advance for the support and the great work realized  
with all the luce

Re: EmbeddedSolrServer vs CommonsHttpSolrServer

2010-02-12 Thread Ron Chan
don't think this is a bug, the default behaviour is for /data to sit under Solr 
home 

there should be no need to use this parameter unless it is special case 

not sure why it is like this in the example 


- Original Message - 
From: dcdmailbox-i...@yahoo.it 
To: solr-user@lucene.apache.org 
Sent: Friday, 12 February, 2010 10:36:41 AM 
Subject: Re: EmbeddedSolrServer vs CommonsHttpSolrServer 

Yes you are right. 
[code.2] works fine by commenting out the following lines on solrconfig.xml 

 
 


Is it correct this different behaviour from EmbeddedSolrServer ? 
Or it can be considered a low priority bug? 
Thanks for you prompt reply! 
Dino. 
-- 




 
Da: Ron Chan  
A: solr-user@lucene.apache.org 
Inviato: Ven 12 febbraio 2010, 11:14:58 
Oggetto: Re: EmbeddedSolrServer vs CommonsHttpSolrServer 

I suspect this has something to do with the dataDir setting in the example 's 
solrconfig.xml 

${solr.data.dir:./solr/data} 

we use the example's solrconfig.xml as the base for our deployments and always 
comment this out 

the default of having conf and data sitting under the solr home works well 


- Original Message - 
From: dcdmailbox-i...@yahoo.it 
To: solr-user@lucene.apache.org 
Sent: Friday, 12 February, 2010 8:30:57 AM 
Subject: EmbeddedSolrServer vs CommonsHttpSolrServer 

Hi all, 

I am new to solr/solrj. 

I correctly started up the server example given in the distribution 
(apache-solr-1.4.0\example\solr), populated the index with test data set, and 
successfully tested with http query string via browser (es. 
http://localhost:8983/solr/select/?indent=on&q=video&fl=name,id) 

I am trying to set up solrj clients using both CommonsHttpSolrServer and 
EmbeddedSolrServer. 

My examples are with single core configuration. 

Here below the method used for CommonsHttpSolrServer initialization: 

[code.1] 
public SolrServer getCommonsHttpSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
String url = "http://localhost:8983/solr";; 
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); 
server.setSoTimeout(1000); // socket read timeout 
server.setConnectionTimeout(100); 
server.setDefaultMaxConnectionsPerHost(100); 
server.setMaxTotalConnections(100); 
server.setFollowRedirects(false); // defaults to false 
// allowCompression defaults to false. 
// Server side must support gzip or deflate for this to have any effect. 
server.setAllowCompression(true); 
server.setMaxRetries(1); // defaults to 0. > 1 not recommended. 
return server; 
} 

Here below the method used for EmbeddedSolrServer initialization (provided in 
the wiki section): 

[code.2] 
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
System.setProperty("solr.solr.home", 
"/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); 
CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 
CoreContainer coreContainer = initializer.initialize(); 
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); 
return server; 
} 

Here below the common code used to query the server: 
[code.3] 

SolrServer server = mintIdxMain.getEmbeddedSolrServer(); 
//SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); 

SolrQuery query = new SolrQuery("video"); 
QueryResponse rsp = server.query(query); 
SolrDocumentList docs = rsp.getResults(); 

System.out.println("Found : " + docs.getNumFound()); 
System.out.println("Start : " + docs.getStart()); 
System.out.println("Max Score: " + docs.getMaxScore()); 


CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives 
always no results. 
What's wrong with the initialization and/or the configuration of the 
EmbeddedSolrServer? 
CoreContainer.Initializer() seems to not recognize the single core from 
solrconfig.xml... 

If I modify [code.2] with the following code, it seems to work. 
I manually added only explicit Core Container registration. 
Is [code.4] the correct way? 

[code.4] 
public SolrServer getEmbeddedSolrServer() throws IOException, 
ParserConfigurationException, SAXException, SolrServerException { 
System.setProperty("solr.solr.home", 
"/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); 

CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 
CoreContainer coreContainer = initializer.initialize(); 

/* > */ 
SolrConfig solrConfig = new 
SolrConfig("/WORKSPACE/bin/apache-solr-1.4.0/example/solr", "solrconfig.xml", 
null); 
IndexSchema indexSchema = new IndexSchema(solrConfig, "schema.xml", null); 
CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, "", 
solrConfig.getResourceLoader().getInstanceDir()); 
SolrCore core = new SolrCore(null, 
"/WORKSPACE/bin/apache-solr-1.4.0/example/solr/data", solrConfig, indexSchema, 
coreDescriptor); 
coreContainer.register("", core, false); 
/* < */ 

EmbeddedSolrServer server = new EmbeddedSolrServer(coreContain

Good literature on search basics

2010-02-12 Thread javaxmlsoapdev

Does anyone know good literature(web resources, books etc) on basics of
search? I do have Solr 1.4 and Lucene books but wanted to go in more details
on basics. 

Thanks,
-- 
View this message in context: 
http://old.nabble.com/Good-literature-on-search-basics-tp27562021p27562021.html
Sent from the Solr - User mailing list archive at Nabble.com.



persistent cache

2010-02-12 Thread Tim Terlegård
Does Solr use some sort of a persistent cache?

I do this 10 times in a loop:
  * start solr
  * create a core
  * execute warmup query
  * execute query with sort fields
  * stop solr

Executing the query with sort fields takes 5-20 times longer the first
iteration than the other 9 iterations. For instance I have a query
'hockey' with one date sort field. That takes 768 ms in the first
iteration of the loop. The next 9 iterations the query takes 52 ms.
The solr and jetty server really stops in each iteration so the RAM
must be emptied. So the only way I can think of why this happens is
because there is some persistent cache that survives the solr
restarts. Is this the case? Or why could this be?

/Tim


Re: persistent cache

2010-02-12 Thread Shalin Shekhar Mangar
2010/2/12 Tim Terlegård 

> Does Solr use some sort of a persistent cache?
>
> I do this 10 times in a loop:
>  * start solr
>  * create a core
>  * execute warmup query
>  * execute query with sort fields
>  * stop solr
>
> Executing the query with sort fields takes 5-20 times longer the first
> iteration than the other 9 iterations. For instance I have a query
> 'hockey' with one date sort field. That takes 768 ms in the first
> iteration of the loop. The next 9 iterations the query takes 52 ms.
> The solr and jetty server really stops in each iteration so the RAM
> must be emptied. So the only way I can think of why this happens is
> because there is some persistent cache that survives the solr
> restarts. Is this the case? Or why could this be?
>
>
Solr does not have a persistent cache. That is the operating system's file
cache at work.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Dismax phrase queries

2010-02-12 Thread Shalin Shekhar Mangar
On Fri, Feb 12, 2010 at 6:06 AM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:

> I'd like to boost an exact phrase match such as q="video poker" over
> q=video poker.  How would I do this using dismax?
>
> I tried pre-processing video poker into, video poker "video poker"
> however that just gets munged by dismax into "video poker video
> poker"... Which is wrong.
>
>
Have you tried the pf parameter?

-- 
Regards,
Shalin Shekhar Mangar.


Re: spellcheck

2010-02-12 Thread michaelnazaruk

I try to config spellcheck, but I still have this problem:
Config:

  solr.FileBasedSpellChecker
  file
  spellings.txt
  UTF-8
  ./spellcheckerFile


  


  

  false
  false
  1
  true
  file


  spellcheck

  

Maybe I have this result because I work with dictionary? For request
'popular' I still get 'populars', but in dictionary I have popular and
populars! 
-- 
View this message in context: 
http://old.nabble.com/spellcheck-tp27527425p27562959.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Local Solr Inconsistent results for radius

2010-02-12 Thread Mauricio Scheffer
Hi Emad,

I had the same issue (
http://old.nabble.com/Spatial---Local-Solr-radius-td26943608.html ), it
seems that this happens only on eastern areas of the world. Try inverting
the sign of all your longitudes, or translate all your longitudes to the
west.

Cheers,
Mauricio

On Fri, Feb 12, 2010 at 7:22 AM, Emad Mushtaq
wrote:

> Hello,
>
> I have a question related to local solr. For certain locations (latitude,
> longitude), the spatial search does not work. Here is the query I try to
> make which gives me no results:
>
> q=*&qt=geo&sort=geo_distance asc&lat=33.718151&long=73.
> 060547&radius=450
>
> However if I make the same query with radius=449, it gives me results.
>
> Here is part of my solrconfig.xml containing startTier and endTier:
>
> 
>  class="com.pjaol.search.solr.update.LocalUpdateProcessorFactory">
>latitude 
>longitude 
>
>9
>17
>   
>   
>   
>   
>
> What do I need to do to fix this problem?
>
>
> --
> Muhammad Emad Mushtaq
> http://www.emadmushtaq.com/
>


Re: inconsistency between analysis.jsp and actual search

2010-02-12 Thread Ahmet Arslan
> Anyways so how can I get "st.gallen" split into two terms
> at query time?

As you mentioned in your first mail, query st.gallen is already broken into two 
terms/words. But query parser constructs a phrase query.

There was an disscussion about this behaviour earlier.
http://www.lucidimagination.com/search/document/d41bc0ef422b9238/understanding_the_query_parser#85db37e69ef29dba





  


Fwd: indexing: issue with default values

2010-02-12 Thread nabil rabhi
in the schema.xml I have fileds with int type and default value
exp:  
but when a document has no value for the field "postal_code"
at indexing, I get the following error:

Posting file Immo.xml to http://localhost:8983/solr/update/



Error 500 

HTTP ERROR: 500For input string: ""

java.lang.NumberFormatException: For input string: ""
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:470)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)







04


any help? thx


Re: persistent cache

2010-02-12 Thread Tim Terlegård
2010/2/12 Shalin Shekhar Mangar :
> 2010/2/12 Tim Terlegård 
>
>> Does Solr use some sort of a persistent cache?
>>
> Solr does not have a persistent cache. That is the operating system's file
> cache at work.

Aha, that's very interesting and seems to make sense.

So is the primary goal of warmup queries to allow the operating system
to cache all the files in the data/index directory? Because I think
the difference (768ms vs 52ms) is pretty big. I just do one warmup
query and get 52 ms response on a 40 million documents index. I think
that's pretty nice performance without tinkering with the caches at
all. The only tinkering that seems to be needed is this operating
system file caching. What's the best way to make sure that my warmup
queries have cached all the files? And does a file cache have the
complete file in memory? I guess it can get tough to get my 100GB
index into the 16GB memory.

/Tim


Re: Good literature on search basics

2010-02-12 Thread Jaco
See http://markmail.org/thread/z5sq2jr2a6eayth4


On 12 February 2010 12:14, javaxmlsoapdev  wrote:

>
> Does anyone know good literature(web resources, books etc) on basics of
> search? I do have Solr 1.4 and Lucene books but wanted to go in more
> details
> on basics.
>
> Thanks,
> --
> View this message in context:
> http://old.nabble.com/Good-literature-on-search-basics-tp27562021p27562021.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: indexing: issue with default values

2010-02-12 Thread Erik Hatcher
When a document has no value, are you still sending a postal_code  
field in your post to Solr?  Seems like you are.


Erik

On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote:


in the schema.xml I have fileds with int type and default value
exp:  stored="true"

default="0"/>
but when a document has no value for the field "postal_code"
at indexing, I get the following error:

Posting file Immo.xml to http://localhost:8983/solr/update/




Error 500 

HTTP ERROR: 500For input string: ""

java.lang.NumberFormatException: For input string: ""
   at
java 
.lang 
.NumberFormatException.forInputString(NumberFormatException.java:48)

   at java.lang.Integer.parseInt(Integer.java:470)
   at java.lang.Integer.parseInt(Integer.java:499)
   at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
   at  
org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)

   at
org 
.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java: 
246)

   at
org 
.apache 
.solr 
.update 
.processor 
.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java: 
139)

   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at
org 
.apache 
.solr 
.handler 
.ContentStreamHandlerBase 
.handleRequestBody(ContentStreamHandlerBase.java:54)

   at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)

   at
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)

   at
org.mortbay.jetty.servlet.ServletHandler 
$CachedChain.doFilter(ServletHandler.java:1089)

   at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java: 
365)

   at
org 
.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 
216)

   at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java: 
181)

   at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java: 
712)
   at  
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)

   at
org 
.mortbay 
.jetty 
.handler 
.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)

   at
org 
.mortbay 
.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)

   at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java: 
139)

   at org.mortbay.jetty.Server.handle(Server.java:285)
   at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 
502)

   at
org.mortbay.jetty.HttpConnection 
$RequestHandler.content(HttpConnection.java:835)

   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
   at
org.mortbay.jetty.bio.SocketConnector 
$Connection.run(SocketConnector.java:226)

   at
org.mortbay.thread.BoundedThreadPool 
$PoolThread.run(BoundedThreadPool.java:442)








04


any help? thx




Re: Dismax phrase queries

2010-02-12 Thread Jason Rutherglen
Was going to post that I more or less figured it out.  Dismax handles
this automatically with the ps parameter, which is different than the
bs parameter...

On Fri, Feb 12, 2010 at 3:48 AM, Shalin Shekhar Mangar
 wrote:
> On Fri, Feb 12, 2010 at 6:06 AM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> I'd like to boost an exact phrase match such as q="video poker" over
>> q=video poker.  How would I do this using dismax?
>>
>> I tried pre-processing video poker into, video poker "video poker"
>> however that just gets munged by dismax into "video poker video
>> poker"... Which is wrong.
>>
>>
> Have you tried the pf parameter?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: indexing: issue with default values

2010-02-12 Thread nabil rabhi
yes, sometimes the document has postal_code with no values , i still post it
to solr
2010/2/12 Erik Hatcher 

> When a document has no value, are you still sending a postal_code field in
> your post to Solr?  Seems like you are.
>
>Erik
>
>
> On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote:
>
>  in the schema.xml I have fileds with int type and default value
>> exp:  > default="0"/>
>> but when a document has no value for the field "postal_code"
>> at indexing, I get the following error:
>>
>> Posting file Immo.xml to http://localhost:8983/solr/update/
>> 
>> 
>> 
>> Error 500 
>> 
>> HTTP ERROR: 500For input string: ""
>>
>> java.lang.NumberFormatException: For input string: ""
>>   at
>>
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>   at java.lang.Integer.parseInt(Integer.java:470)
>>   at java.lang.Integer.parseInt(Integer.java:499)
>>   at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
>>   at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
>>   at
>>
>> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
>>   at
>>
>> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
>>   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
>>   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>>   at
>>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>>   at
>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>   at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>   at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>>   at
>>
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>>   at
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>>   at
>>
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>   at
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>>   at
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>>   at
>>
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>>   at
>>
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>>   at
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>>   at org.mortbay.jetty.Server.handle(Server.java:285)
>>   at
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>>   at
>>
>> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
>>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
>>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
>>   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>>   at
>>
>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>>   at
>>
>> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>> 
>>
>> 
>> 
>>
>> 
>> 
>> 0> name="QTime">4
>> 
>>
>> any help? thx
>>
>
>


Re: Collating results from multiple indexes

2010-02-12 Thread Jan Høydahl / Cominvent
Really? The last time I looked at AIE, I am pretty sure there was Solr core 
msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may be 
mistaken. Anyone from Attivio here who can elaborate? Is the join stuff at 
Lucene level or on top of multiple Solr cores or what?

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote:

> Minor correction re Attivio - their stuff runs on top of Lucene, not Solr.  I 
> *think* they are trying to patent this.
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 
> 
> - Original Message 
>> From: Jan Høydahl / Cominvent 
>> To: solr-user@lucene.apache.org
>> Sent: Mon, February 8, 2010 3:33:41 PM
>> Subject: Re: Collating results from multiple indexes
>> 
>> Hi,
>> 
>> There is no JOIN functionality in Solr. The common solution is either to 
>> accept 
>> the high volume update churn, or to add client side code to build a "join" 
>> layer 
>> on top of the two indices. I know that Attivio (www.attivio.com) have built 
>> some 
>> kind of JOIN functionality on top of Solr in their AIE product, but do not 
>> know 
>> the details or the actual performance.
>> 
>> Why not open a JIRA issue, if there is no such already, to request this as a 
>> feature?
>> 
>> --
>> Jan Høydahl  - search architect
>> Cominvent AS - www.cominvent.com
>> 
>> On 25. jan. 2010, at 22.01, Aaron McKee wrote:
>> 
>>> 
>>> Is there any somewhat convenient way to collate/integrate fields from 
>>> separate 
>> indices during result writing, if the indices use the same unique keys? 
>> Basically, some sort of cross-index JOIN?
>>> 
>>> As a bit of background, I have a rather heavyweight dataset of every US 
>> business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours 
>> to 
>> fully index on a decent box). Given the size and relatively stability of the 
>> dataset, I generally only update this monthly. However, I have separate 
>> advertising-related datasets that need to be updated either hourly or daily 
>> (e.g. today's coupon, click revenue remaining, etc.) . These advertiser 
>> feeds 
>> reference the same keyspace that I use in the main index, but are otherwise 
>> significantly lighter weight. Importing and indexing them discretely only 
>> takes 
>> a couple minutes. Given that Solr/Lucene doesn't support field updating, 
>> without 
>> having to drop and re-add an entire document, it doesn't seem practical to 
>> integrate this data into the main index (the system would be under a 
>> constant 
>> state of churn, if we did document re-inserts, and the performance impact 
>> would 
>> probably be debilitating). It may be nice if this data could participate in 
>> filtering (e.g. only show advertisers), but it doesn't need to participate 
>> in 
>> scoring/ranking.
>>> 
>>> I'm guessing that someone else has had a similar need, at some point?  I 
>>> can 
>> have our front-end query the smaller indices separately, using the keys 
>> returned 
>> by the primary index, but would prefer to avoid the extra sequential 
>> roundtrips. 
>> I'm hoping to also avoid a coding solution, if only to avoid the maintenance 
>> overhead as we drop in new builds of Solr, but that's also feasible.
>>> 
>>> Thank you for your insight,
>>> Aaron
>>> 
> 



Re: indexing: issue with default values

2010-02-12 Thread Erik Hatcher
That would be the problem then, I believe.  Simply don't post a value  
to get the default value to work.


Erik

On Feb 12, 2010, at 10:18 AM, nabil rabhi wrote:

yes, sometimes the document has postal_code with no values , i still  
post it

to solr
2010/2/12 Erik Hatcher 

When a document has no value, are you still sending a postal_code  
field in

your post to Solr?  Seems like you are.

  Erik


On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote:

in the schema.xml I have fileds with int type and default value
exp:  stored="true"

default="0"/>
but when a document has no value for the field "postal_code"
at indexing, I get the following error:

Posting file Immo.xml to http://localhost:8983/solr/update/




Error 500 

HTTP ERROR: 500For input string: ""

java.lang.NumberFormatException: For input string: ""
 at

java 
.lang 
.NumberFormatException.forInputString(NumberFormatException.java:48)

 at java.lang.Integer.parseInt(Integer.java:470)
 at java.lang.Integer.parseInt(Integer.java:499)
 at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
 at  
org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)

 at

org 
.apache 
.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)

 at

org 
.apache 
.solr 
.update 
.processor 
.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java: 
139)

 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
 at

org 
.apache 
.solr 
.handler 
.ContentStreamHandlerBase 
.handleRequestBody(ContentStreamHandlerBase.java:54)

 at

org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at

org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 
338)

 at

org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
241)

 at

org.mortbay.jetty.servlet.ServletHandler 
$CachedChain.doFilter(ServletHandler.java:1089)

 at
org 
.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java: 
365)

 at

org 
.mortbay 
.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)

 at
org 
.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java: 
181)

 at
org 
.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java: 
712)
 at  
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java: 
405)

 at

org 
.mortbay 
.jetty 
.handler 
.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)

 at

org 
.mortbay 
.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)

 at
org 
.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java: 
139)

 at org.mortbay.jetty.Server.handle(Server.java:285)
 at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 
502)

 at

org.mortbay.jetty.HttpConnection 
$RequestHandler.content(HttpConnection.java:835)

 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
 at

org.mortbay.jetty.bio.SocketConnector 
$Connection.run(SocketConnector.java:226)

 at

org.mortbay.thread.BoundedThreadPool 
$PoolThread.run(BoundedThreadPool.java:442)








04


any help? thx








Re: indexing: issue with default values

2010-02-12 Thread nabil rabhi
thanx Eric, that was very helpfull

2010/2/12 Erik Hatcher 

> That would be the problem then, I believe.  Simply don't post a value to
> get the default value to work.
>
>Erik
>
>
> On Feb 12, 2010, at 10:18 AM, nabil rabhi wrote:
>
>  yes, sometimes the document has postal_code with no values , i still post
>> it
>> to solr
>> 2010/2/12 Erik Hatcher 
>>
>>  When a document has no value, are you still sending a postal_code field
>>> in
>>> your post to Solr?  Seems like you are.
>>>
>>>  Erik
>>>
>>>
>>> On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote:
>>>
>>> in the schema.xml I have fileds with int type and default value
>>>
 exp:  >>> default="0"/>
 but when a document has no value for the field "postal_code"
 at indexing, I get the following error:

 Posting file Immo.xml to http://localhost:8983/solr/update/
 
 
 
 Error 500 
 
 HTTP ERROR: 500For input string: ""

 java.lang.NumberFormatException: For input string: ""
  at


 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
  at java.lang.Integer.parseInt(Integer.java:470)
  at java.lang.Integer.parseInt(Integer.java:499)
  at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
  at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
  at


 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
  at


 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
  at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
  at


 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
  at


 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
  at


 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at


 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at


 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
  at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
  at


 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
  at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
  at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
  at


 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
  at


 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
  at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
  at org.mortbay.jetty.Server.handle(Server.java:285)
  at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
  at


 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
  at


 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
  at


 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 

 
 

 
 
 0>>> name="QTime">4
 

 any help? thx


>>>
>>>
>


Re: persistent cache

2010-02-12 Thread Tommy Chheng
 One solution is to add the persistent cache with memcache at the 
application layer.


--
Tommy Chheng

Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com



On 2/12/10 5:19 AM, Tim Terlegård wrote:

2010/2/12 Shalin Shekhar Mangar:

2010/2/12 Tim Terlegård


Does Solr use some sort of a persistent cache?


Solr does not have a persistent cache. That is the operating system's file
cache at work.

Aha, that's very interesting and seems to make sense.

So is the primary goal of warmup queries to allow the operating system
to cache all the files in the data/index directory? Because I think
the difference (768ms vs 52ms) is pretty big. I just do one warmup
query and get 52 ms response on a 40 million documents index. I think
that's pretty nice performance without tinkering with the caches at
all. The only tinkering that seems to be needed is this operating
system file caching. What's the best way to make sure that my warmup
queries have cached all the files? And does a file cache have the
complete file in memory? I guess it can get tough to get my 100GB
index into the 16GB memory.

/Tim



--
Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com



Re: Text search within facets?

2010-02-12 Thread Ahmet Arslan
> For example, if I have the following field:
> 
>  stored="true"/>
> 
> and it has docs that contain something like
> 
> english bulldog
> french bulldog
> bichon frise
> 
> If I search for "english bulldog" and facet on "dog", I
> will get the
> following:
> 
> 135
> 23
> 12

Thats strange. The query "english bulldog" should return only 
english bulldog since type of dog is string which is not 
tokenized. 
What is your default search field defined in schema.xml? Can you try 
&q=dog:"english bulldog"&facet=true&facet.field=dog&facet.mincount=1



  


expire/delete documents

2010-02-12 Thread Matthieu Labour
HiIs there a way for solr or lucene to expire documents based on a field in a 
document. Let's say that I have a createTime field whose type is date, can i 
set a policy in schema.xml for solr to delete the documents older than X 
days?Thank you


  

Re: Local Solr Inconsistent results for radius

2010-02-12 Thread Emad Mushtaq
Hello Mauricio,

Do you know why such a problem occurs. Has it to do with certain latitudes,
longitudes. If so why is it happening. Is it a bug in local solr?

On Fri, Feb 12, 2010 at 5:50 PM, Mauricio Scheffer <
mauricioschef...@gmail.com> wrote:

> Hi Emad,
>
> I had the same issue (
> http://old.nabble.com/Spatial---Local-Solr-radius-td26943608.html ), it
> seems that this happens only on eastern areas of the world. Try inverting
> the sign of all your longitudes, or translate all your longitudes to the
> west.
>
> Cheers,
> Mauricio
>
> On Fri, Feb 12, 2010 at 7:22 AM, Emad Mushtaq
> wrote:
>
> > Hello,
> >
> > I have a question related to local solr. For certain locations (latitude,
> > longitude), the spatial search does not work. Here is the query I try to
> > make which gives me no results:
> >
> > q=*&qt=geo&sort=geo_distance asc&lat=33.718151&long=73.
> > 060547&radius=450
> >
> > However if I make the same query with radius=449, it gives me results.
> >
> > Here is part of my solrconfig.xml containing startTier and endTier:
> >
> > 
> >  > class="com.pjaol.search.solr.update.LocalUpdateProcessorFactory">
> >latitude 
> >longitude 
> >
> >9
> >17
> >   
> >   
> >   
> >   
> >
> > What do I need to do to fix this problem?
> >
> >
> > --
> > Muhammad Emad Mushtaq
> > http://www.emadmushtaq.com/
> >
>



-- 
Muhammad Emad Mushtaq
http://www.emadmushtaq.com/


Re: expire/delete documents

2010-02-12 Thread Mat Brown
You could easily have a scheduled job that ran delete by query to
remove posts older than a certain date...

On Fri, Feb 12, 2010 at 13:00, Matthieu Labour
 wrote:
> HiIs there a way for solr or lucene to expire documents based on a field in a 
> document. Let's say that I have a createTime field whose type is date, can i 
> set a policy in schema.xml for solr to delete the documents older than X 
> days?Thank you
>
>
>


Re: Deleting spelll checker index

2010-02-12 Thread darniz

HI Guys 
Opening this thread again.
I need to get around this issue.
i have a spellcheck field defined and i am copying two fileds make and model
to this field


i have buildoncommit and buildonoptimize set to true hence when i index data
and try to search for a work accod i get back suggestion accord since model
is also being copied.
I stop the sorl server removed the copy filed for model. now i only copy
make to the spellText field and started solr server. 
i refreshed the dictiaonry by issuring the following command.
spellcheck.build=true&spellcheck.dictionary=default
So i hope it should rebuild by dictionary, bu the strange thing is that it
still gives a suggestion for accrd.
I have to reindex data again and then it wont offer me suggestion which is
the correct behavour.

How can i create the dictionary again by changing my schema and issuing a
command 
spellcheck.build=true&spellcheck.dictionary=default

i cant afford to reindex data everytime.

Any answer ASAP will be appreciated

Thanks
darniz









darniz wrote:
> 
> Then i assume the easiest way is to delete the directory itself.
> 
> darniz
> 
> 
> hossman wrote:
>> 
>> 
>> : We are using Index based spell checker.
>> : i was wondering with the help of any url parameters can we delete the
>> spell
>> : check index directory.
>> 
>> I don't think so.
>> 
>> You might be able to configure two differnet spell check components that 
>> point at the same directory -- one hat builds off of a real field, and
>> one 
>> that builds off of an (empty) text field (using FileBasedSpellChecker) .. 
>> then you could trigger a rebuild of an empty spell checking index using 
>> the second component.
>> 
>> But i've never tried it so i have no idea if it would work.
>> 
>> 
>> -Hoss
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27567465.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Local Solr Inconsistent results for radius

2010-02-12 Thread Mauricio Scheffer
Yes, it seems to be a bug, at least with the code you and I are using. If
you don't need to search across the whole globe, try translating your
longitudes as I suggested.

On Fri, Feb 12, 2010 at 3:04 PM, Emad Mushtaq
wrote:

> Hello Mauricio,
>
> Do you know why such a problem occurs. Has it to do with certain latitudes,
> longitudes. If so why is it happening. Is it a bug in local solr?
>
> On Fri, Feb 12, 2010 at 5:50 PM, Mauricio Scheffer <
> mauricioschef...@gmail.com> wrote:
>
> > Hi Emad,
> >
> > I had the same issue (
> > http://old.nabble.com/Spatial---Local-Solr-radius-td26943608.html ), it
> > seems that this happens only on eastern areas of the world. Try inverting
> > the sign of all your longitudes, or translate all your longitudes to the
> > west.
> >
> > Cheers,
> > Mauricio
> >
> > On Fri, Feb 12, 2010 at 7:22 AM, Emad Mushtaq
> > wrote:
> >
> > > Hello,
> > >
> > > I have a question related to local solr. For certain locations
> (latitude,
> > > longitude), the spatial search does not work. Here is the query I try
> to
> > > make which gives me no results:
> > >
> > > q=*&qt=geo&sort=geo_distance asc&lat=33.718151&long=73.
> > > 060547&radius=450
> > >
> > > However if I make the same query with radius=449, it gives me results.
> > >
> > > Here is part of my solrconfig.xml containing startTier and endTier:
> > >
> > > 
> > >  > > class="com.pjaol.search.solr.update.LocalUpdateProcessorFactory">
> > >latitude 
> > >longitude 
> > >
> > >9
> > >17
> > >   
> > >   
> > >   
> > >   
> > >
> > > What do I need to do to fix this problem?
> > >
> > >
> > > --
> > > Muhammad Emad Mushtaq
> > > http://www.emadmushtaq.com/
> > >
> >
>
>
>
> --
> Muhammad Emad Mushtaq
> http://www.emadmushtaq.com/
>


Re: persistent cache

2010-02-12 Thread Tom Burton-West

Hi Tim,

We generally run about 1600 cache-warming queries to warm up the OS disk
cache and the Solr caches when we mount a new index.

Do you have/expect phrase queries?   If you don't, then you don't need to
get any position information into your OS disk cache.  Our position
information takes about 85% of the total index size (*prx files).  So with a
100GB index, your *frq files might only be 15-20GB and you could probably
get more than half of that in 16GB of memory.

If you have limited memory and a large index, then you need to choose cache
warming queries carefully as once the cache is full, further queries will
start evicting older data from the cache.  The tradeoff is to populate the
cache with data that would require the most disk access if the data was not
in the cache versus populating the cache based on your best guess of what
queries your users will execute.  A good overview of the issues is the paper
by Baeza-Yates ( http://doi.acm.org/10.1145/1277741.125 The Impact of
Caching on Search Engines )


Tom Burton-West
Digital Library Production Service
University of Michigan Library
-- 
View this message in context: 
http://old.nabble.com/persistent-cache-tp27562126p27567840.html
Sent from the Solr - User mailing list archive at Nabble.com.



Has anyone prepared a general purpose synonyms.txt for search engines

2010-02-12 Thread Emad Mushtaq
Hi,

I was wondering if anyone has prepared a synonyms.txt for general purpose
search engines,  that can be shared. If not could you refer me to places
where such a synonym list or thesaurus can be found. Synonyms for search
engines are different from the regular thesaurus. Any help would be highly
appreciated. Thanks.

-- 
Muhammad Emad Mushtaq
http://www.emadmushtaq.com/


Re: Has anyone prepared a general purpose synonyms.txt for search engines

2010-02-12 Thread Julian Hille
Hi,

at openthesaurus.org or .com you can find a mysql version of synonyms you just 
have to join it to fit the synonym schema of solr yourself.


Am 12.02.2010 um 20:03 schrieb Emad Mushtaq:

> Hi,
> 
> I was wondering if anyone has prepared a synonyms.txt for general purpose
> search engines,  that can be shared. If not could you refer me to places
> where such a synonym list or thesaurus can be found. Synonyms for search
> engines are different from the regular thesaurus. Any help would be highly
> appreciated. Thanks.
> 
> -- 
> Muhammad Emad Mushtaq
> http://www.emadmushtaq.com/

Mit freundlichen Grüßen,
Julian Hille




Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-02-12 Thread Amit Nithian
Hi all,

I am the author of the article referenced in this thread and after reading
it again, I can understand where there might have been confusion and my
apologies on that. I have edited the article to indicate that a
deduplication component is in the works and referenced SOLR-236. The article
can still be found at
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Solr-and-RDBMS-design-basics

My only question after reading this thread is what does a user purchase? A
product identified by a SKU? If that's the case then certainly indexing by
SKU is the way to go and then using the field collapse (the query time
deduplication) should work.

Also keep in mind that in my example, I was talking about the *exact* same
product located in different locations which could yield a bad user
experience if they were all shown on the same search result page. In your
case, each SKU is a unique (purchasable) product so collapsing by product id
is nice but would not doing so degrade the user experience? If I searched
for a green shirt and got S,M,L (all product ID 3) is that bad?

Hope that helps some
Amit

On Sat, Jan 16, 2010 at 3:43 PM, David MARTIN  wrote:

> I'm really interested in reading the answer to this thread as my problem is
> rather the same. Maybe my main difference is the huge SKU number per
> product
> I may have.
>
>
> David
>
> On Thu, Jan 14, 2010 at 2:35 AM, Kelly Taylor 
> wrote:
>
> >
> > Hoss,
> >
> > Would you suggest using dedup for my use case; and if so, do you know of
> a
> > working example I can reference?
> >
> > I don't have an issue using the patched version of Solr, but I'd much
> > rather
> > use the GA version.
> >
> > -Kelly
> >
> >
> >
> > hossman wrote:
> > >
> > >
> > > : Dedupe is completely the wrong word. Deduping is something else
> > > : entirely - it is about trying not to index the same document twice.
> > >
> > > Dedup can also certainly be used with field collapsing -- that was one
> of
> > > the initial use cases identified for the
> SignatureUpdateProcessorFactory
> > > ... you can compute an 'expensive' signature when adding a document,
> > index
> > > it, and then FieldCollapse on that signature field.
> > >
> > > This gives you "query time deduplication" based on a value computed
> when
> > > indexing (the canonical example is multiple urls refrenceing the "same"
> > > content but with slightly differnet boilerplate markup.  You can use a
> > > Signature class that recognizes the boilerplate and computes an
> identical
> > > signature value for each URL whose content is "the same" but still
> index
> > > all of the URLs and their content as distinct documents ... so use
> cases
> > > where people only "distinct" URLs work using field collapse but by
> > default
> > > all matching documents can still be returned and searches on text in
> the
> > > boilerplate markup also still work.
> > >
> > >
> > > -Hoss
> > >
> > >
> > >
> >
> > --
> > View this message in context:
> >
> http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27155115.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>


Re: Has anyone prepared a general purpose synonyms.txt for search engines

2010-02-12 Thread Emad Mushtaq
Wow thanks!! You all are awesome! :D :D

On Sat, Feb 13, 2010 at 12:32 AM, Julian Hille  wrote:

> Hi,
>
> at openthesaurus.org or .com you can find a mysql version of synonyms you
> just have to join it to fit the synonym schema of solr yourself.
>
>
> Am 12.02.2010 um 20:03 schrieb Emad Mushtaq:
>
> > Hi,
> >
> > I was wondering if anyone has prepared a synonyms.txt for general purpose
> > search engines,  that can be shared. If not could you refer me to places
> > where such a synonym list or thesaurus can be found. Synonyms for search
> > engines are different from the regular thesaurus. Any help would be
> highly
> > appreciated. Thanks.
> >
> > --
> > Muhammad Emad Mushtaq
> > http://www.emadmushtaq.com/
>
> Mit freundlichen Grüßen,
> Julian Hille
>
>
>


-- 
Muhammad Emad Mushtaq
http://www.emadmushtaq.com/


Re: Has anyone prepared a general purpose synonyms.txt for search engines

2010-02-12 Thread Julian Hille
Hi,

Your welcome. Thats something google came up with some weeks ago :)


Am 12.02.2010 um 20:42 schrieb Emad Mushtaq:

> Wow thanks!! You all are awesome! :D :D
> 
> On Sat, Feb 13, 2010 at 12:32 AM, Julian Hille  wrote:
> 
>> Hi,
>> 
>> at openthesaurus.org or .com you can find a mysql version of synonyms you
>> just have to join it to fit the synonym schema of solr yourself.
>> 
>> 
>> Am 12.02.2010 um 20:03 schrieb Emad Mushtaq:
>> 
>>> Hi,
>>> 
>>> I was wondering if anyone has prepared a synonyms.txt for general purpose
>>> search engines,  that can be shared. If not could you refer me to places
>>> where such a synonym list or thesaurus can be found. Synonyms for search
>>> engines are different from the regular thesaurus. Any help would be
>> highly
>>> appreciated. Thanks.
>>> 
>>> --
>>> Muhammad Emad Mushtaq
>>> http://www.emadmushtaq.com/
>> 
>> Mit freundlichen Grüßen,
>> Julian Hille
>> 
>> 
>> 
> 
> 
> -- 
> Muhammad Emad Mushtaq
> http://www.emadmushtaq.com/

Mit freundlichen Grüßen,
Julian Hille


---
NetImpact KG
Altonaer Straße 8
20357 Hamburg

Tel: 040 / 6738363 2
Mail: jul...@netimpact.de

Geschäftsführer: Tarek Müller



Re: implementing profanity detector

2010-02-12 Thread Mike Perham
On Thu, Feb 11, 2010 at 10:49 AM, Grant Ingersoll  wrote:
>
> Otherwise, I'd do it via copy fields.  Your first field is your main field 
> and is analyzed as before.  Your second field does the profanity detection 
> and simply outputs a single token at the end, safe/unsafe.
>
> How long are your documents?  The extra copy field is extra work, but in this 
> case it should be fast as you should be able to create a pretty streamlined 
> analyzer chain for the second task.
>

The documents are web page text, so they shouldn't be more than 10-20k
generally.  Would something like this do the trick?

  @Override
  public boolean incrementToken() throws IOException {
while (input.incrementToken()) {
  if (profanities.contains(termAtt.termBuffer(), 0, termAtt.termLength())) {
  termAtt.setTermBuffer("y", 0, 1);
  return false;
  }
}
termAtt.setTermBuffer("n", 0, 1);
return false;
  }

mike


For caches, any reason to not set initialSize and size to the same value?

2010-02-12 Thread Jay Hill
If I've done a lot of research and have a very good idea of where my cache
sizes are having monitored the stats right before commits, is there any
reason why I wouldn't just set the initialSize and size counts to the same
values? Is there any reason to set a smaller initialSize if I know reliably
that where my limit will almost always be?

-Jay


Re: For caches, any reason to not set initialSize and size to the same value?

2010-02-12 Thread Yonik Seeley
On Fri, Feb 12, 2010 at 5:23 PM, Jay Hill  wrote:
> If I've done a lot of research and have a very good idea of where my cache
> sizes are having monitored the stats right before commits, is there any
> reason why I wouldn't just set the initialSize and size counts to the same
> values? Is there any reason to set a smaller initialSize if I know reliably
> that where my limit will almost always be?

Probably not much...
The only savings will be the 8 bytes (on a 64 bit proc) per unused
array slot (in the HashMap).
Maybe we should consider removing the initialSize param from the
example config to reduce the amount of stuff a user needs to think
about.

-Yonik
http://www.lucidimagination.com


reloading sharedlib folder

2010-02-12 Thread Joe Calderon
when using solr.xml, you can specify a sharedlib directory to share
among cores, is it possible to reload the classes in this dir without
having to restart the servlet container? it would be useful to be able
to make changes to those classes on the fly or be able to drop in new
plugins


RE: For caches, any reason to not set initialSize and size to the same value?

2010-02-12 Thread Fuad Efendi
I always use initial size = max size,
just to avoid Arrays.copyOf()...

Initial (default) capacity for HashMap is 16, when it is not enough - array
copy to new 32-element array, then to 64, ...
- too much wasted space! (same for ConcurrentHashMap)

Excuse me if I didn't understand the question...

-Fuad
http://www.tokenizer.ca



> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: February-12-10 6:30 PM
> To: solr-user@lucene.apache.org
> Subject: Re: For caches, any reason to not set initialSize and size to
> the same value?
> 
> On Fri, Feb 12, 2010 at 5:23 PM, Jay Hill 
> wrote:
> > If I've done a lot of research and have a very good idea of where my
> cache
> > sizes are having monitored the stats right before commits, is there
> any
> > reason why I wouldn't just set the initialSize and size counts to the
> same
> > values? Is there any reason to set a smaller initialSize if I know
> reliably
> > that where my limit will almost always be?
> 
> Probably not much...
> The only savings will be the 8 bytes (on a 64 bit proc) per unused
> array slot (in the HashMap).
> Maybe we should consider removing the initialSize param from the
> example config to reduce the amount of stuff a user needs to think
> about.
> 
> -Yonik
> http://www.lucidimagination.com




RE: For caches, any reason to not set initialSize and size to the same value?

2010-02-12 Thread Fuad Efendi
Funny, Arrays.copy() for HashMap... but something similar...

Anyway, I use same values for initial size and max size, to be safe... and
to have OOP at startup :) 



> -Original Message-
> From: Fuad Efendi [mailto:f...@efendi.ca]
> Sent: February-12-10 6:55 PM
> To: solr-user@lucene.apache.org; yo...@lucidimagination.com
> Subject: RE: For caches, any reason to not set initialSize and size to
> the same value?
> 
> I always use initial size = max size,
> just to avoid Arrays.copyOf()...
> 
> Initial (default) capacity for HashMap is 16, when it is not enough -
> array
> copy to new 32-element array, then to 64, ...
> - too much wasted space! (same for ConcurrentHashMap)
> 
> Excuse me if I didn't understand the question...
> 
> -Fuad
> http://www.tokenizer.ca
> 
> 
> 
> > -Original Message-
> > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
> > Seeley
> > Sent: February-12-10 6:30 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: For caches, any reason to not set initialSize and size to
> > the same value?
> >
> > On Fri, Feb 12, 2010 at 5:23 PM, Jay Hill 
> > wrote:
> > > If I've done a lot of research and have a very good idea of where my
> > cache
> > > sizes are having monitored the stats right before commits, is there
> > any
> > > reason why I wouldn't just set the initialSize and size counts to
> the
> > same
> > > values? Is there any reason to set a smaller initialSize if I know
> > reliably
> > > that where my limit will almost always be?
> >
> > Probably not much...
> > The only savings will be the 8 bytes (on a 64 bit proc) per unused
> > array slot (in the HashMap).
> > Maybe we should consider removing the initialSize param from the
> > example config to reduce the amount of stuff a user needs to think
> > about.
> >
> > -Yonik
> > http://www.lucidimagination.com
> 





Re: Deleting spelll checker index

2010-02-12 Thread darniz

Any update on this
Do you guys want to rephrase my question, if its not clear.

Thanks
darniz


darniz wrote:
> 
> HI Guys 
> Opening this thread again.
> I need to get around this issue.
> i have a spellcheck field defined and i am copying two fileds make and
> model to this field
> 
> 
> i have buildoncommit and buildonoptimize set to true hence when i index
> data and try to search for a work accod i get back suggestion accord since
> model is also being copied.
> I stop the sorl server removed the copy filed for model. now i only copy
> make to the spellText field and started solr server. 
> i refreshed the dictiaonry by issuring the following command.
> spellcheck.build=true&spellcheck.dictionary=default
> So i hope it should rebuild by dictionary, bu the strange thing is that it
> still gives a suggestion for accrd.
> I have to reindex data again and then it wont offer me suggestion which is
> the correct behavour.
> 
> How can i create the dictionary again by changing my schema and issuing a
> command 
> spellcheck.build=true&spellcheck.dictionary=default
> 
> i cant afford to reindex data everytime.
> 
> Any answer ASAP will be appreciated
> 
> Thanks
> darniz
> 
> 
> 
> 
> 
> 
> 
> 
> 
> darniz wrote:
>> 
>> Then i assume the easiest way is to delete the directory itself.
>> 
>> darniz
>> 
>> 
>> hossman wrote:
>>> 
>>> 
>>> : We are using Index based spell checker.
>>> : i was wondering with the help of any url parameters can we delete the
>>> spell
>>> : check index directory.
>>> 
>>> I don't think so.
>>> 
>>> You might be able to configure two differnet spell check components that 
>>> point at the same directory -- one hat builds off of a real field, and
>>> one 
>>> that builds off of an (empty) text field (using FileBasedSpellChecker)
>>> .. 
>>> then you could trigger a rebuild of an empty spell checking index using 
>>> the second component.
>>> 
>>> But i've never tried it so i have no idea if it would work.
>>> 
>>> 
>>> -Hoss
>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27570613.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: implementing profanity detector

2010-02-12 Thread Chris Hostetter

: Otherwise, I'd do it via copy fields.  Your first field is your main 
: field and is analyzed as before.  Your second field does the profanity 
: detection and simply outputs a single token at the end, safe/unsafe.

you don't even need custom code for this ... copyFiled all your text into 
a 'has_profanity' field where you use a suitable Tokenizer followed by the 
KeepWordsTokenFilter that only keeps profane words and then a 
PatternReplaceTokenFilter that matches .* and replaces it with "HELL_YEA" 
... now a search for "is_profane:HELL_YEA" finds all profane docs, with 
the added bonus that the scores are based on how many profane words occur 
in the doc.

it could be used as a filter query (probably negated) as needed.



-Hoss



Re: expire/delete documents

2010-02-12 Thread Chris Hostetter

: You could easily have a scheduled job that ran delete by query to
: remove posts older than a certain date...

or since you specificly asked about delteing anything older 
then X days (in this example i'm assuming x=7)...

  createTime:[NOW-7DAYS TO *]



-Hoss



migrating from solr 1.3 to 1.4

2010-02-12 Thread Sachin Sebastian

Hi there,

   I'm trying to migrate from solr 1.3 to solr 1.4 and I've few 
issues. Initially my localsolr was throwing NullPointer exception and I 
fixed it by changing type of lat and lng to 'tdouble'. But now I'm not 
able to update index. When I try to update index it throws out error 
saying -


Feb 12, 2010 2:14:11 PM 
org.apache.solr.update.processor.LogUpdateProcessor finish

INFO: {} 0 0
Feb 12, 2010 2:14:11 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoSuchFieldError: log
at 
com.pjaol.search.solr.update.LocalUpdaterProcessor.processAdd(LocalUpdateProcessorFactory.java:138) 



I tried searching on net, but none of post regarding this issue is 
answered. Have anyone come across this issue?


Thanks,
Sachin.


cannot match on phrase queries

2010-02-12 Thread Kevin Osborn
I am seeing this in several of my fields. I have something like "Samsung 
X150" or "Nokia BH-212". And my query will not match on X150 or BH-212.

So, my query is something like +model:(Samsung X150). Through debugQuery, I see 
that this gets converted to +(model:samsung model:"x 150"). It 
matches on Samsung, but not X150. A simple query like model:BH-212 
simply fails. model:BH212 also fails. The only query that seems to work 
is model:(BH 212).

Here is the schema for that field:


  








  

  









  




Any ideas? According to the analyzer, I would expect the phrase "BH-212" to 
match on "bh" and 
"212". Or am I missing something?

Also, is there anyway to tell the parser to not convert "X150" into a phrase 
query. I have some cases when it would be more useful to turn it into +(X 150).



  

Re: Solr 1.4: Full import FileNotFoundException

2010-02-12 Thread Chris Hostetter

: I have noticed that when I run concurrent full-imports using DIH in Solr
: 1.4, the index ends up getting corrupted. I see the following in the log

I'm fairly confident that concurrent imports won't work -- but it 
shouldn't corrupt your index -- even if the DIH didn't actively check for 
this type of situation, the underlying Lucene LockFactory should ensure 
that one of the inports "wins" ... you'll need to tell us what kind of 
Filesystem you are using, and show us the relevent settings from your 
solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH, 
etc...)

At worst you should get a lock time out exception.

: But I looked at:
: 
http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html
: 
: and was under the impression that this issue was fixed in Solr 1.4.

...right, attempting to run two concurrent imports with DIH should cause 
the second one to abort immediatley.




-Hoss



Re: Cannot get like exact searching to work

2010-02-12 Thread Chris Hostetter

: > Can your query consist of more than one words?
: 
: Yes, and I expect it almost always will (the query string is coming
: from a search box on a website).
...
: Actually it won't. The data I am indexing has extra spaces in front
: and is capitalized. I really need to be able to filter it through the
: lowercase and trim filter without tokenizing it.
...
: >> The idea is that a "phrase" match would be boosted over the
: >> normal
: >> token matches and would show up first in the listing. Let

This is starting to smell like an XY Problem...
http://people.apache.org/~hossman/#xyproblem

...you mentioned wanting prefix type queries to work, but that seems to be 
based on your initial approach of using an "exact" (ie: untokenized) field 
for your matches -- all of your examples seem to want matching at a "word" 
level, not partial words.

If your ultimate goal is just that "exact' matches score higher then 
documents containing all fo the same words in a differnet order (which 
should score higher then docs only containing a few of the words) then i 
think you are just making things harder for yourself then you really need 
... "defType=dismax" should be able to solve all of your problems -- just 
specify the field(s) you want to search in the qf and pf params and 
documents with all the "words" in a phrase will appear first.



-Hoss



Interesting stuff; Solr as a syslog store.

2010-02-12 Thread Antonio Lobato
Hey everyone, I don't actually have a question, but I just thought I'd 
share something really cool that I did with Solr for our company.


We run a good amount of servers, well into the several hundreds, and 
naturally we need a way to centralize all of the system logs.  For a 
while we used a commercial solution to centralize and search our logs, 
but they wanted to charge us tens of thousands of dollars for just one 
gigabyte/day more of indexed data.  So I said forget it, I'll write my 
own solution!


We already use Solr for some of our other backend searching systems, so 
I came up with an idea to index all of our logs to Solr.  I wrote a 
daemon in perl that listens on the syslog port, and pointed every single 
system's syslog to forward to this single server.  From there, this 
daemon will write to a Solr indexing server after parsing them into 
fields, such as date/time, host, program, pid, text, etc.  I then wrote 
a cool javascript/ajax web front end for Solr searching, and bam.  Real 
time searching of all of our syslogs from a web interface, for no cost!


Just thought this would be a neat story to share with you all.  I've 
really grown to love Solr, it's something else!


Thanks,
-Antonio


Re: sorting

2010-02-12 Thread Chris Hostetter

:title^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8
:title^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8

FWIW: I don't think you understand what the "bf" param is for ... it's not 
analogous to qf and pf, it's for expressing a list of boost functions -- a 
function can be a simple field name, but that typically only makes sense 
if it's numeric.

that *may* be causing your problem, if the function parser is attempting 
to generate the FieldCache for your content fields.

: now, solr is complaining about some sorting issues on content* as they

"solr is complaining" is relaly vauge... please explain *exactly* what the 
error message is, where you see it, what the full stack trace looks like 
if there is one, and what you did to trigger te error (ie: did it happen 
on startup?  did it happen when you executed a query? what was the full 
URL of hte query?



-Hoss



Re: sorting

2010-02-12 Thread Chris Hostetter

: that *may* be causing your problem, if the function parser is attempting 
: to generate the FieldCache for your content fields.

Yep ... that's it ... if you use a barefield name as a function, and that 
field name is not numeric, the result is an OrdFieldSource shiceh uses the 
FieldCache.

I opened a bug to improve the error message...

https://issues.apache.org/jira/browse/SOLR-1771


-Hoss



RE: expire/delete documents

2010-02-12 Thread Fuad Efendi
> or since you specificly asked about delteing anything older
> then X days (in this example i'm assuming x=7)...
> 
>   createTime:[NOW-7DAYS TO *]

createTime:[* TO NOW-7DAYS]






Re: How to reindex data without restarting server

2010-02-12 Thread Chris Hostetter

: if you use the core model via solr.xml you can reload a core without having to
: to restart the servlet container,
: http://wiki.apache.org/solr/CoreAdmin

For making a schema change, the steps would be:
  - create a "new_core" with the new schema
  - reindex all the docs into "new_core"
  - "SWAP" "old_core" and "new_core" so all the old URLs now point at the 
new core with the new schema.

-Hoss



Re: Deleting spelll checker index

2010-02-12 Thread Chris Hostetter

: Any update on this

Patience my friend ... 5 hours after you send an email isn't long enough 
to wait before asking for "any update on this" -- it's just increasing the 
volume of mail everyone gets and distracting people from actual 
bugs/issues.

FWIW: this doesn't really seem directly related to the thread you
initially started about Deleting the spell checker index -- what you're
asking about now is rebuilding the spellchecker index...

: > I stop the sorl server removed the copy filed for model. now i only copy
: > make to the spellText field and started solr server.
: > i refreshed the dictiaonry by issuring the following command.
: > spellcheck.build=true&spellcheck.dictionary=default
: > So i hope it should rebuild by dictionary, bu the strange thing is that it
: > still gives a suggestion for accrd.

that's because removing the copyField declaration doens't change anything
about the values that have already been copied to the "spellText" field
-- rebuilding your spellcheker index is just re-reading the same
indexed values from that field.

: > How can i create the dictionary again by changing my schema and issuing a
: > command 
: > spellcheck.build=true&spellcheck.dictionary=default

it's just not possible.  a schema change like that doesn't magicly 
undo all of the values that were already copied.



-Hoss



Re: cannot match on phrase queries

2010-02-12 Thread Kevin Osborn
It appears that 
omitTermFreqAndPositions is indeed the culprit. I assume it has to do with the 
fact that the index parsing of BH-212 puts multiple terms in the same position.





From: Kevin Osborn 
To: Solr 
Sent: Fri, February 12, 2010 5:28:08 PM
Subject: cannot match on phrase queries


I am seeing this in several of my fields. I have something like "Samsung 
X150" or "Nokia BH-212". And my query will not match on X150 or BH-212.

So, my query is something like +model:(Samsung X150). Through debugQuery, I see 
that this gets converted to +(model:samsung model:"x 150"). It 
matches on Samsung, but not X150. A simple query like model:BH-212 
simply fails. model:BH212 also fails. The only query that seems to work 
is model:(BH 212).

Here is the schema for that field:


  








  

  









  




Any ideas? According to the analyzer, I would expect the phrase "BH-212" to 
match on "bh" and 
"212". Or am I missing something?

Also, is there anyway to tell the parser to not convert "X150" into a phrase 
query. I have some cases when it would be more useful to turn it into +(X 150).


  

Re: Solr 1.4: Full import FileNotFoundException

2010-02-12 Thread Noble Paul നോബിള്‍ नोब्ळ्
concurrent imports are not allowed in DIH, unless u setup multiple DIH instances

On Sat, Feb 13, 2010 at 7:05 AM, Chris Hostetter
 wrote:
>
> : I have noticed that when I run concurrent full-imports using DIH in Solr
> : 1.4, the index ends up getting corrupted. I see the following in the log
>
> I'm fairly confident that concurrent imports won't work -- but it
> shouldn't corrupt your index -- even if the DIH didn't actively check for
> this type of situation, the underlying Lucene LockFactory should ensure
> that one of the inports "wins" ... you'll need to tell us what kind of
> Filesystem you are using, and show us the relevent settings from your
> solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH,
> etc...)
>
> At worst you should get a lock time out exception.
>
> : But I looked at:
> : 
> http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html
> :
> : and was under the impression that this issue was fixed in Solr 1.4.
>
> ...right, attempting to run two concurrent imports with DIH should cause
> the second one to abort immediatley.
>
>
>
>
> -Hoss
>
>



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Re: Solr 1.4: Full import FileNotFoundException

2010-02-12 Thread Chris Hostetter

: concurrent imports are not allowed in DIH, unless u setup multiple DIH 
instances

Right, but that's not the issue -- the question is wether attemping 
to do so might be causing index corruption (either because of a bug or 
because of some possibly really odd config we currently know nothing about)


: > : I have noticed that when I run concurrent full-imports using DIH in Solr
: > : 1.4, the index ends up getting corrupted. I see the following in the log
: >
: > I'm fairly confident that concurrent imports won't work -- but it
: > shouldn't corrupt your index -- even if the DIH didn't actively check for
: > this type of situation, the underlying Lucene LockFactory should ensure
: > that one of the inports "wins" ... you'll need to tell us what kind of
: > Filesystem you are using, and show us the relevent settings from your
: > solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH,
: > etc...)
: >
: > At worst you should get a lock time out exception.
: >
: > : But I looked at:
: > : 
http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html
: > :
: > : and was under the impression that this issue was fixed in Solr 1.4.
: >
: > ...right, attempting to run two concurrent imports with DIH should cause
: > the second one to abort immediatley.
: >
: >
: >
: >
: > -Hoss
: >
: >
: 
: 
: 
: -- 
: -
: Noble Paul | Systems Architect| AOL | http://aol.com
: 



-Hoss



parsing strings into phrase queries

2010-02-12 Thread Kevin Osborn
Right now if I have the query model:(Nokia BH-212V), the parser turns this into 
+(model:nokia model:"bh 212 v"). The problem is that I might have a model 
called Nokia BH-212, so this is completely missed. In my case, I would like my 
query to be +(model:nokia model:bh model:212 model:v).

This is my schema for the field:


  







  
  
 

 
 
 

 
  




  

Re: Interesting stuff; Solr as a syslog store.

2010-02-12 Thread Olivier Dobberkau

Am 13.02.2010 um 03:02 schrieb Antonio Lobato:

> Just thought this would be a neat story to share with you all.  I've really 
> grown to love Solr, it's something else!

Hi Antonio,

Great.

Would you also share the source code somewhere! 
May the Source be with you. 

Thanks.

Olivier