date:20131226

How to use Solr in my project

2013-12-26 Thread Fatima Issawi

Hello,

First off, I apologize if this was sent twice. I was having issues subscribing 
to the list.

I'm a complete noob in Solr (and indexing), so I'm hoping someone can help me 
figure out how to implement Solr in my project. I have gone through some 
tutorials online and I was able to import and query text in some Arabic PDF 
documents.

We have some scans of Historical Handwritten Arabic documents that will have 
text extracted into a database (or PDF). We would like the user to be able to 
search the document for text, then have the scanned image show up in a viewer 
with the text highlighted. I would like to use Solr to index the text in the 
documents, but I'm unsure how to store and get the "word location" in Solr  
(area of text that needs to be highlighted).

Do I index and store the full document in the Solr? How do l link the "search 
term" to the "word location" on the page?
The only way I can figure out how to do this involves querying the database for 
the "word" and "location" after querying Solr for the search term, but is that 
defeating the purpose of using Solr?

I would really appreciate help figuring this out.

Thank you,
Fatima

Re: How to use Solr in my project

2013-12-26 Thread Gora Mohanty

On 26 December 2013 10:54, Fatima Issawi  wrote:
> Hello,
>
> First off, I apologize if this was sent twice. I was having issues 
> subscribing to the list.
>
> I'm a complete noob in Solr (and indexing), so I'm hoping someone can help me 
> figure out how to implement Solr in my project. I have gone through some 
> tutorials online and I was able to import and query text in some Arabic PDF 
> documents.
>
> We have some scans of Historical Handwritten Arabic documents that will have 
> text extracted into a database (or PDF). We would like the user to be able to 
> search the document for text, then have the scanned image show up in a viewer 
> with the text highlighted.

This will not work for scanned images which do not actually contain the
text. If you have the text of the documents, the best that you can do is
break the text into pages corresponding to the scanned images, and
index into Solr the text from the pages and the scanned image that should
be linked to the text. For a user search, you will need to show the scanned
image for the entire page: Highlighting of the search term in an image is not
possible without optical character recognition (OCR).

Similarly, if you are indexing from PDFs, you will need to ensure that they
contain text, and not just images.

Regards,
Gora

Solr update document issue

2013-12-26 Thread mohammad

Hello all, in our last project we use solr as search engine to search for
assets.  we have a functionality to search for product in it's summary text,
the product itself is "container for a set of products parent" so each time
we add new product under it , the summary of the "parent product" should be
updated to add the new text.so in this case each time we add new child
product, the parent product summary text should be updated,some times the
added summary text list is empty sometimes not, but in case of empty list
the document all field are delted except version and id. to avoid this
problem we ignore the update behavior in case of empty list.*A. in case of
update with empty list:*   1.added document is :   
121112  hehe  go go   goool 
ollay  hehedoc11455476967916699648
2. after update 
   1211121455476967916699659   
*B. in case of not empty list in update request:* 1. same as in a.1. 2.  
121112  hehe  go go   goool 
ollay  hehe  go go   12312312312312312 123123123  ollay
1232131231231231313doc11455476967916699648
i use solrj and solr4.4.0.my schema document :  
my java
code to test this senario is as follow://TestingSolrUpdateDoc.javapackage
org.solr.test;import java.io.IOException;import java.util.ArrayList;import
java.util.HashMap;import java.util.List;import java.util.Map;import
org.apache.solr.client.solrj.SolrServerException;import
org.apache.solr.common.SolrInputDocument;public class TestingSolrUpdateDoc {
public static void main(String[] args) {try {   
addDoc(121112, false);  }
catch (SolrServerException e) { // TODO Auto-generated catch 
block  
e.printStackTrace();} catch (IOException e) {   
// TODO Auto-generated
catch block e.printStackTrace();}   }   
public static void addDoc(long id,
boolean emptyListUpdate)throws SolrServerException, 
IOException {   
SolrInputDocument solrInputDocument = new SolrInputDocument();  
solrInputDocument.setField("id", new Long(id)); 
solrInputDocument.setField("text", generateRandomTextList());   
solrInputDocument.setField("name", "doc1"); SolrConnection 
connection =
SolrConnection.getConnection(); 
connection.addDocument(solrInputDocument);  
if (emptyListUpdate) {  solrInputDocument = new 
SolrInputDocument();
solrInputDocument.setField("id", new Long(id)); Map update
= new HashMap();update.put("add", new 
ArrayList()); 
solrInputDocument.addField("text", update); 
connection.updateDocument(solrInputDocument);   } else {
solrInputDocument
= new SolrInputDocument();  
solrInputDocument.setField("id", new Long(id)); 
Map update = new HashMap(); 
update.put("add", generateRandomUpdateTextList());  
solrInputDocument.addField("text", update); 
connection.updateDocument(solrInputDocument);   }   }   private 
static List
generateRandomTextList() {  List texts = new ArrayList();   
texts.add("hehe");  texts.add("go go ");
texts.add("goool"); texts.add("ollay"); 
return
texts;  }   private static List generateRandomUpdateTextList() {
List texts =
new ArrayList();texts.add("hehe");  texts.add("go 
go ");
texts.add("12312312312312312 123123123");   texts.add("ollay
1232131231231231313");  return texts;   }}//SolrConnection .javapackage
org.solr.test;import java.io.IOException;import
org.apache.solr.client.solrj.SolrServer;import
org.apache.solr.client.solrj.SolrServerException;import
org.apache.solr.client.solrj.impl.HttpSolrServer;import
org.apache.solr.common.SolrInputDocument;public class SolrConnection {
private SolrServer solrServer = new
HttpSolrServer("http://localhost:8983/solr/test";);  private static
SolrConnection solrConnection = new SolrConnection();   private
SolrConnection() {  }   public static SolrConnection getConnection() {  
if
(solrConnection != null) {  return solrConnection;  
}   synchronized
(SolrConnection.class) {if (solrConnection != null) {   
return
solrConnection; }   solrConnection = new 
SolrConnection();  return
solrConnection; }   }   public void 
addDocument(SolrInputDocument doc) throws
SolrServerException,IOException {   
solrServer.add(doc);
solrServer.commit();}   public

Re: Solr update document issue

2013-12-26 Thread mohammad

Hello all,

 in our last project we use solr as search engine to search for assets.
 
 we have a functionality to search for product in it's summary text, the
product itself is "container for a set of products parent" so each time we
add new product under it , the summary of the "parent product" should be
updated to add the new text.

so in this case each time we add new child product, the parent product
summary text should be updated,
some times the added summary text list is empty sometimes not, but in case
of empty list the document all field are delted except version and id.

 to avoid this problem we ignore the update behavior in case of empty list.

*A. in case of update with empty list:*

   1.added document is :
   

2. after update 

*B. in case of not empty list in update request:*
 1. same as in a.1.
 2. 
 


i use solrj and solr4.4.0.

my schema document :
  

my java code to test this senario is as follow:

//TestingSolrUpdateDoc.java

//SolrConnection .java

Best Thanks,
Mohammad yaseen



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-update-document-issue-tp4108214p4108215.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How to use Solr in my project

2013-12-26 Thread Fatima Issawi

Hi,

I should clarify. We have another application extracting the text from the 
document. The full text from each document will be stored in a database either 
at the document level or page level (this hasn't been decided yet). We will 
also be storing word location of each word on the page in the database. 

What I'm having problems with is deciding on the schema. We want a user to be 
able to search for a word in the database, have a list of documents that word 
is located in, and location in the document that word is located it. When he 
selects the search results, we want the scanned picture to have that word 
highlighted on the page. 

I want to index the document using Solr, but I'm having trouble figuring out 
how to design the schema to return that "word location" of a search term on the 
scanned picture in order to highlight it.

Does this make more sense?

Fatima

-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Thursday, December 26, 2013 1:00 PM
To: solr-user@lucene.apache.org
Subject: Re: How to use Solr in my project

On 26 December 2013 10:54, Fatima Issawi  wrote:
> Hello,
>
> First off, I apologize if this was sent twice. I was having issues 
> subscribing to the list.
>
> I'm a complete noob in Solr (and indexing), so I'm hoping someone can help me 
> figure out how to implement Solr in my project. I have gone through some 
> tutorials online and I was able to import and query text in some Arabic PDF 
> documents.
>
> We have some scans of Historical Handwritten Arabic documents that will have 
> text extracted into a database (or PDF). We would like the user to be able to 
> search the document for text, then have the scanned image show up in a viewer 
> with the text highlighted.

This will not work for scanned images which do not actually contain the text. 
If you have the text of the documents, the best that you can do is break the 
text into pages corresponding to the scanned images, and index into Solr the 
text from the pages and the scanned image that should be linked to the text. 
For a user search, you will need to show the scanned image for the entire page: 
Highlighting of the search term in an image is not possible without optical 
character recognition (OCR).

Similarly, if you are indexing from PDFs, you will need to ensure that they 
contain text, and not just images.

Regards,
Gora

Re: How to use Solr in my project

2013-12-26 Thread Gora Mohanty

On 26 December 2013 15:44, Fatima Issawi  wrote:
> Hi,
>
> I should clarify. We have another application extracting the text from the 
> document. The full text from each document will be stored in a database 
> either at the document level or page level (this hasn't been decided yet). We 
> will also be storing word location of each word on the page in the database.

What do you mean by "word location"? The number on the page? What purpose
would this serve?

> What I'm having problems with is deciding on the schema. We want a user to be 
> able to search for a word in the database, have a list of documents that word 
> is located in, and location in the document that word is located it. When he 
> selects the search results, we want the scanned picture to have that word 
> highlighted on the page.
[...]

I think that you might be confusing things:
* If you have the full-text, you can highlight where the word was found. Solr
  highlighting handles this for you, and there is no need to store word location
* You can have different images (presumably, individual scanned pages) linked
   to different sections of text, and show the entire image.
Highlighting in the image
   is not possible, unless by "word location" you mean the (x, y) coordinates of
   the word on the page. Even then:
   - It will be prohibitively expensive to store the location of every
word in every
 image for a large number of documents
   - Some image processing will be required to handle the highlighting after the
 scanned image is retrieved

Regards,
Gora

Solr Query Slowliness

2013-12-26 Thread Jilal Oussama

Hi all,

I have multiple python scripts querying solr with the sunburnt module.

Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB memory
& 840 GB storage) and contained several cores for different usage.

When I manually executed a query through Solr Admin (a query containing
10~15 terms, with some of them having boosts over one field and limited to
one result without any sorting or faceting etc ) it takes around 700
ms, and the Core contained 7 million documents.

When the scripts are executed things get slower, my query takes 7~10s.

Then what I did is to turn to SolrCloud expecting huge performance increase.

I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
with 28 ECU, 15 GB memory & 160 SSD storage), then I created one collection
to contain the core I was querying, I sharded it to 25 shards (each node
containing 5 shards without replication), each shards took 54 MB of storage.

Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
is very good !

Tested my scripts again (I have 30 scripts running at the same time), and
as a surprise, things run fast for 5 seconds then it turns realy slow again
(query time ).

I updated the solrconfig.xml to remove the query caches (I don't need them
since queries are very different and only 1 time queries) and changes the
index memory to 1 GB, but only got a small increase (3~4s for each query ?!)

Any ideas ?

PS: My index size will not stay with 7m documents, it will grow to +100m
and that may get things worse

Re: Solr Query Slowliness

2013-12-26 Thread Rafał Kuć

Hello!

Could you tell us more about your scripts? What they do? If the
queries are the same? How many results you fetch with your scripts and
so on.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


> Hi all,

> I have multiple python scripts querying solr with the sunburnt module.

> Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB memory
> & 840 GB storage) and contained several cores for different usage.

> When I manually executed a query through Solr Admin (a query containing
> 10~15 terms, with some of them having boosts over one field and limited to
> one result without any sorting or faceting etc ) it takes around 700
> ms, and the Core contained 7 million documents.

> When the scripts are executed things get slower, my query takes 7~10s.

> Then what I did is to turn to SolrCloud expecting huge performance increase.

> I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
> with 28 ECU, 15 GB memory & 160 SSD storage), then I created one collection
> to contain the core I was querying, I sharded it to 25 shards (each node
> containing 5 shards without replication), each shards took 54 MB of storage.

> Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
> is very good !

> Tested my scripts again (I have 30 scripts running at the same time), and
> as a surprise, things run fast for 5 seconds then it turns realy slow again
> (query time ).

> I updated the solrconfig.xml to remove the query caches (I don't need them
> since queries are very different and only 1 time queries) and changes the
> index memory to 1 GB, but only got a small increase (3~4s for each query ?!)

> Any ideas ?

> PS: My index size will not stay with 7m documents, it will grow to +100m
> and that may get things worse

Re: Solr Query Slowliness

2013-12-26 Thread Jilal Oussama

Thanks Rafal for your reply,

My scripts are running on other independent machines so they does not
affect Solr, I did mention that the queries are not the same (that is why I
removed the query cache from solrconfig.xml), and I only get 1 result from
Solr (which is the top scored one so no sorting since it is by default
ordred by score)



2013/12/26 Rafał Kuć 

> Hello!
>
> Could you tell us more about your scripts? What they do? If the
> queries are the same? How many results you fetch with your scripts and
> so on.
>
> --
> Regards,
>  Rafał Kuć
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> > Hi all,
>
> > I have multiple python scripts querying solr with the sunburnt module.
>
> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
> memory
> > & 840 GB storage) and contained several cores for different usage.
>
> > When I manually executed a query through Solr Admin (a query containing
> > 10~15 terms, with some of them having boosts over one field and limited
> to
> > one result without any sorting or faceting etc ) it takes around 700
> > ms, and the Core contained 7 million documents.
>
> > When the scripts are executed things get slower, my query takes 7~10s.
>
> > Then what I did is to turn to SolrCloud expecting huge performance
> increase.
>
> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one
> collection
> > to contain the core I was querying, I sharded it to 25 shards (each node
> > containing 5 shards without replication), each shards took 54 MB of
> storage.
>
> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
> > is very good !
>
> > Tested my scripts again (I have 30 scripts running at the same time), and
> > as a surprise, things run fast for 5 seconds then it turns realy slow
> again
> > (query time ).
>
> > I updated the solrconfig.xml to remove the query caches (I don't need
> them
> > since queries are very different and only 1 time queries) and changes the
> > index memory to 1 GB, but only got a small increase (3~4s for each query
> ?!)
>
> > Any ideas ?
>
> > PS: My index size will not stay with 7m documents, it will grow to +100m
> > and that may get things worse
>
>

Re: Solr Query Slowliness

2013-12-26 Thread Rafał Kuć

Hello!

Different queries can have different execution time, that's why I
asked about the details. When running the scripts, is Solr CPU fully
utilized? To tell more I would like to see what queries are run
against Solr from scripts.

Do you have any information on network throughput between the server
you are running scripts on and the Solr cluster? You wrote that the
scripts are fine for 5 seconds and than they get slow. If your Solr
cluster is not fully utilized I would take a look at the queries and
what they return (ie. using faceting with facet.limit=-1) and seeing
if the network is able to process those. 

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


> Thanks Rafal for your reply,

> My scripts are running on other independent machines so they does not
> affect Solr, I did mention that the queries are not the same (that is why I
> removed the query cache from solrconfig.xml), and I only get 1 result from
> Solr (which is the top scored one so no sorting since it is by default
> ordred by score)



> 2013/12/26 Rafał Kuć 

>> Hello!
>>
>> Could you tell us more about your scripts? What they do? If the
>> queries are the same? How many results you fetch with your scripts and
>> so on.
>>
>> --
>> Regards,
>>  Rafał Kuć
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> > Hi all,
>>
>> > I have multiple python scripts querying solr with the sunburnt module.
>>
>> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
>> memory
>> > & 840 GB storage) and contained several cores for different usage.
>>
>> > When I manually executed a query through Solr Admin (a query containing
>> > 10~15 terms, with some of them having boosts over one field and limited
>> to
>> > one result without any sorting or faceting etc ) it takes around 700
>> > ms, and the Core contained 7 million documents.
>>
>> > When the scripts are executed things get slower, my query takes 7~10s.
>>
>> > Then what I did is to turn to SolrCloud expecting huge performance
>> increase.
>>
>> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
>> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one
>> collection
>> > to contain the core I was querying, I sharded it to 25 shards (each node
>> > containing 5 shards without replication), each shards took 54 MB of
>> storage.
>>
>> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
>> > is very good !
>>
>> > Tested my scripts again (I have 30 scripts running at the same time), and
>> > as a surprise, things run fast for 5 seconds then it turns realy slow
>> again
>> > (query time ).
>>
>> > I updated the solrconfig.xml to remove the query caches (I don't need
>> them
>> > since queries are very different and only 1 time queries) and changes the
>> > index memory to 1 GB, but only got a small increase (3~4s for each query
>> ?!)
>>
>> > Any ideas ?
>>
>> > PS: My index size will not stay with 7m documents, it will grow to +100m
>> > and that may get things worse
>>
>>

org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query

2013-12-26 Thread PeterKerk

I'm trying to setup Solr with a Wordpress database running on MySQL.

But on trying a full import:
`http://localhost:8983/solr/tv-wordpress/dataimport?command=full-import`


The error is:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query


**data-config.xml**







   





I also tried including the database name in the SQL statement: 

SELECT * FROM wptalkman.wp_posts WHERE post_status='publish';

and change the connection url to `jdbc:mysql@localhost:3306`

But I'm still unable to execute the query. 


**console output**  

194278 [Thread-22] INFO  org.apache.solr.update.UpdateHandler  û start
rollback{
}
194279 [Thread-22] INFO  org.apache.solr.update.DefaultSolrCoreState  û
Creating
 new IndexWriter...
194279 [Thread-22] INFO  org.apache.solr.update.DefaultSolrCoreState  û
Waiting
until IndexWriter is unused... core=tv-wordpress
194280 [Thread-22] INFO  org.apache.solr.update.DefaultSolrCoreState  û
Rollback
 old IndexWriter... core=tv-wordpress
194282 [Thread-22] INFO  org.apache.solr.core.SolrCore  û
SolrDeletionPolicy.onI
nit: commits:num=1
   
commit{dir=NRTCachingDirectory(org.apache.lucene.store.SimpleFSDirectory
   
@C:\Dropbox\Databases\solr-4.3.1\example\example-DIH\solr\tv-wordpress\data\inde
x lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@10ff234;
maxCach
eMB=48.0
maxMergeSizeMB=4.0),segFN=segments_3l,generation=129,filenames=[_3o.nvd
, _3o_Lucene41_0.tim, _3o.fnm, _3o.nvm, _3o_Lucene41_0.tip, _3o.fdt,
_3o_Lucene4
1_0.pos, segments_3l, _3o.fdx, _3o_Lucene41_0.doc, _3o.si]
194283 [Thread-22] INFO  org.apache.solr.core.SolrCore  û newest commit
= 129[_3
o.nvd, _3o_Lucene41_0.tim, _3o.fnm, _3o.nvm, _3o_Lucene41_0.tip,
_3o.fdt, _3o_Lu
cene41_0.pos, segments_3l, _3o.fdx, _3o_Lucene41_0.doc, _3o.si]
194283 [Thread-22] INFO  org.apache.solr.update.DefaultSolrCoreState  û
New Inde
xWriter is ready to be used.
194283 [Thread-22] INFO  org.apache.solr.update.UpdateHandler  û
end_rollback
194669 [qtp32398134-13] INFO 
org.apache.solr.handler.dataimport.DataImporter  û
 Loading DIH Configuration: wordpress-data-config.xml
194672 [qtp32398134-13] INFO 
org.apache.solr.handler.dataimport.DataImporter  û
 Data Configuration loaded successfully
194676 [Thread-23] INFO  org.apache.solr.handler.dataimport.DataImporter 
û Star
ting Full Import
194676 [qtp32398134-13] INFO  org.apache.solr.core.SolrCore  û
[tv-wordpress] we
bapp=/solr path=/dataimport params={command=full-import} status=0
QTime=8
194680 [Thread-23] INFO 
org.apache.solr.handler.dataimport.SimplePropertiesWrit
er  û Read dataimport.properties
194681 [Thread-23] INFO  org.apache.solr.core.SolrCore  û [tv-wordpress]
REMOVIN
G ALL DOCUMENTS FROM INDEX
194686 [Thread-23] INFO 
org.apache.solr.handler.dataimport.JdbcDataSource  û Cr
eating a connection for entity article with URL:
jdbc:mysql@localhost:3306/wptal
kman
194686 [Thread-23] INFO 
org.apache.solr.handler.dataimport.JdbcDataSource  û Ti
me taken for getConnection(): 0
194687 [Thread-23] ERROR org.apache.solr.handler.dataimport.DocBuilder 
û Except
ion while processing: article document :
SolrInputDocument[]:org.apache.solr.han
dler.dataimport.DataImportHandlerException: Unable to execute query:
select * fr
om wp_posts WHERE post_status='publish' Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
Throw(DataImportHandlerException.java:71)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<
init>(JdbcDataSource.java:253)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou
rce.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou
rce.java:38)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEn
tityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEnti
tyProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:243)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:465)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:404)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:319)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:227)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(

Re: Solr Query Slowliness

2013-12-26 Thread Jilal Oussama

This an example of a query:

http://myip:8080/solr/TestCatMatch_shard12_replica1/select?q=Royal+Cashmere+RC+106+CS+Silk+Cashmere+V+Neck+Moss+Green+Men
^10+s+Sweater+Cashmere^3+Men^3+Sweaters^3+Clothing^3&rows=1&wt=json&indent=true

in return :

{
  "responseHeader":{
"status":0,
"QTime":191},
  "response":{"numFound":4539784,"start":0,"maxScore":2.0123534,"docs":[
  {
"Sections":"fashion",
"IdsCategories":"11101911",
"IdProduct":"ef6b8d7cf8340d0c8935727a07baebab",
"Id":"11101911-ef6b8d7cf8340d0c8935727a07baebab",
"Name":"Uniqlo Men Cashmere V Neck Sweater Men Clothing
Sweaters Cashmere",
"_version_":1455419757424541696}]
  }}

This query was executed when no script is running so the QTime is only
191 ms, but it may take up to 3s when they are)


Of course it can be smaller or bigger and of course that affects the
execution time (the execution times I spoke of are the internal ones
returned by solr, not calculated by me).

And yes the CPU is fully used.


2013/12/26 Rafał Kuć 

> Hello!
>
> Different queries can have different execution time, that's why I
> asked about the details. When running the scripts, is Solr CPU fully
> utilized? To tell more I would like to see what queries are run
> against Solr from scripts.
>
> Do you have any information on network throughput between the server
> you are running scripts on and the Solr cluster? You wrote that the
> scripts are fine for 5 seconds and than they get slow. If your Solr
> cluster is not fully utilized I would take a look at the queries and
> what they return (ie. using faceting with facet.limit=-1) and seeing
> if the network is able to process those.
>
> --
> Regards,
>  Rafał Kuć
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> > Thanks Rafal for your reply,
>
> > My scripts are running on other independent machines so they does not
> > affect Solr, I did mention that the queries are not the same (that is
> why I
> > removed the query cache from solrconfig.xml), and I only get 1 result
> from
> > Solr (which is the top scored one so no sorting since it is by default
> > ordred by score)
>
>
>
> > 2013/12/26 Rafał Kuć 
>
> >> Hello!
> >>
> >> Could you tell us more about your scripts? What they do? If the
> >> queries are the same? How many results you fetch with your scripts and
> >> so on.
> >>
> >> --
> >> Regards,
> >>  Rafał Kuć
> >> Performance Monitoring * Log Analytics * Search Analytics
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
> >> > Hi all,
> >>
> >> > I have multiple python scripts querying solr with the sunburnt module.
> >>
> >> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
> >> memory
> >> > & 840 GB storage) and contained several cores for different usage.
> >>
> >> > When I manually executed a query through Solr Admin (a query
> containing
> >> > 10~15 terms, with some of them having boosts over one field and
> limited
> >> to
> >> > one result without any sorting or faceting etc ) it takes around
> 700
> >> > ms, and the Core contained 7 million documents.
> >>
> >> > When the scripts are executed things get slower, my query takes 7~10s.
> >>
> >> > Then what I did is to turn to SolrCloud expecting huge performance
> >> increase.
> >>
> >> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8
> vCPU
> >> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one
> >> collection
> >> > to contain the core I was querying, I sharded it to 25 shards (each
> node
> >> > containing 5 shards without replication), each shards took 54 MB of
> >> storage.
> >>
> >> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase
> wich
> >> > is very good !
> >>
> >> > Tested my scripts again (I have 30 scripts running at the same time),
> and
> >> > as a surprise, things run fast for 5 seconds then it turns realy slow
> >> again
> >> > (query time ).
> >>
> >> > I updated the solrconfig.xml to remove the query caches (I don't need
> >> them
> >> > since queries are very different and only 1 time queries) and changes
> the
> >> > index memory to 1 GB, but only got a small increase (3~4s for each
> query
> >> ?!)
> >>
> >> > Any ideas ?
> >>
> >> > PS: My index size will not stay with 7m documents, it will grow to
> +100m
> >> > and that may get things worse
> >>
> >>
>
>

Re: Configurable collectors for custom ranking

2013-12-26 Thread Peter Keegan

In my case, the final function call looks something like this:
sum(product($k1,score()),product($k2,field(x)))
This means that all the scores would have to scaled and passed down, not
just the top N because even a low score could be offset by a high value in
'field(x)'.

Thanks,
Peter


On Mon, Dec 23, 2013 at 6:37 PM, Joel Bernstein  wrote:

> Peter,
>
> You actually only need the current score being collected to be in the
> request context. So you don't need a map, you just need an object wrapper
> around a mutable float.
>
> If you have a page size of X, only the top X scores need to be held onto,
> because all the other scores wouldn't have made it into that page anyway so
> they might as well be 0. Because the QueryResultCache caches's a larger
> window then the page size you should keep enough scores so the cached
> docList is correct. But if you're only dealing with 150K of results you
> could just keep all the scores in a FloatArrayList and not worry about the
> keeping the top X scores in a priority queue.
>
> During the collect hang onto the docIds and scores and build your scaling
> info.
>
> During the finish iterate your docIds and scale the scores as you go.
>
> Set your scaled score into the object wrapper that is in the request
> context before you collect each document.
>
> When you call collect on the delegate collectors they will call the custom
> value source for each document to perform the sort. Your custom value
> source will return whatever the float value is in the request context at
> that time.
>
> If you're also going to run this postfilter when you're doing a standard
> rank by score you'll also need to send down a dummy scorer to the delegate
> collectors. Spend some time with the CollapsingQParserPlugin in trunk to
> see how the dummy scorer works.
>
> I'll be adding value source collapse criteria to the
> CollapsingQParserPlugin this week and it will have a similar interaction
> between a PostFilter and value source. So you may want to watch SOLR-5536
> to see an example of this.
>
> Joel
>
>
>
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Mon, Dec 23, 2013 at 4:03 PM, Peter Keegan  >wrote:
>
> > Hi Joel,
> >
> > Could you clarify what would be in the key,value Map added to the
> > SearchRequest context? It seems that all the docId/score tuples need to
> be
> > there, including the ones not in the 'top N ScoreDocs' PriorityQueue
> > (score=0). If so would the Map be something like:
> > "scaled_scores",Map ?
> >
> > Also, what is the reason for passing score=0 for documents that aren't in
> > the PriorityQueue? Will these docs get filtered out before a normal sort
> by
> > score?
> >
> > Thanks,
> > Peter
> >
> >
> > On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein 
> > wrote:
> >
> > > The sorting is going to happen in the lower level collectors. You need
> a
> > > value source that returns the score of the document being collected.
> > >
> > > Here is how you can make this happen:
> > >
> > > 1) Create an object in your PostFilter that simply holds the current
> > score.
> > > Place this object in the SearchRequest context map. Update object.score
> > as
> > > you pass the docs and scores to the lower collectors.
> > >
> > > 2) Create a values source that checks the SearchRequest context for the
> > > object that's holding the current score. Use this object to return the
> > > current score when called. For example if you give the value source a
> > > handle called "score" a compound function call will look like this:
> > > sum(score(), field(x))
> > >
> > > Joel
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan  > > >wrote:
> > >
> > > > Regarding my original goal, which is to perform a math function using
> > the
> > > > scaled score and a field value, and sort on the result, how does this
> > fit
> > > > in? Must I implement another custom PostFilter with a higher cost
> than
> > > the
> > > > scale PostFilter?
> > > >
> > > > Thanks,
> > > > Peter
> > > >
> > > >
> > > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <
> peterlkee...@gmail.com
> > > > >wrote:
> > > >
> > > > > Thanks very much for the guidance. I'd be happy to donate a working
> > > > > solution.
> > > > >
> > > > > Peter
> > > > >
> > > > >
> > > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <
> joels...@gmail.com
> > > > >wrote:
> > > > >
> > > > >> SOLR-5020 has the commit info, it's mainly changes to
> > > SolrIndexSearcher
> > > > I
> > > > >> believe. They might apply to 4.3.
> > > > >> I think as long you have the finish method that's all you'll need.
> > If
> > > > you
> > > > >> can get this working it would be excellent if you could donate
> back
> > > the
> > > > >> Scale PostFilter.
> > > > >>
> > > > >>
> > > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <
> > peterlkee...@gmail.com
> > > > >> >wrote:
> > > > >>
> > > > >> > This is what I was looking for, but the DelegatingCollector
> > '

Re: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query

2013-12-26 Thread Shalin Shekhar Mangar

Which version of Solr are you using? Is it possible that the query you
ran returns 0 results?

On Thu, Dec 26, 2013 at 5:44 PM, PeterKerk  wrote:
> I'm trying to setup Solr with a Wordpress database running on MySQL.
>
> But on trying a full import:
> `http://localhost:8983/solr/tv-wordpress/dataimport?command=full-import`
>
>
> The error is:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query
>
>
> **data-config.xml**
>
> 
>  url="jdbc:mysql@localhost:3306/wptalkman" user="root" password="" />
> 
> 
> 
>  />
>  column="post_author" />
> 
> 
> 
>
>
> I also tried including the database name in the SQL statement:
>
> SELECT * FROM wptalkman.wp_posts WHERE post_status='publish';
>
> and change the connection url to `jdbc:mysql@localhost:3306`
>
> But I'm still unable to execute the query.
>
>
> **console output**
>
> 194278 [Thread-22] INFO  org.apache.solr.update.UpdateHandler  û start
> rollback{
> }
> 194279 [Thread-22] INFO  org.apache.solr.update.DefaultSolrCoreState  û
> Creating
>  new IndexWriter...
> 194279 [Thread-22] INFO  org.apache.solr.update.DefaultSolrCoreState  û
> Waiting
> until IndexWriter is unused... core=tv-wordpress
> 194280 [Thread-22] INFO  org.apache.solr.update.DefaultSolrCoreState  û
> Rollback
>  old IndexWriter... core=tv-wordpress
> 194282 [Thread-22] INFO  org.apache.solr.core.SolrCore  û
> SolrDeletionPolicy.onI
> nit: commits:num=1
>
> commit{dir=NRTCachingDirectory(org.apache.lucene.store.SimpleFSDirectory
>
> @C:\Dropbox\Databases\solr-4.3.1\example\example-DIH\solr\tv-wordpress\data\inde
> x lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@10ff234;
> maxCach
> eMB=48.0
> maxMergeSizeMB=4.0),segFN=segments_3l,generation=129,filenames=[_3o.nvd
> , _3o_Lucene41_0.tim, _3o.fnm, _3o.nvm, _3o_Lucene41_0.tip, _3o.fdt,
> _3o_Lucene4
> 1_0.pos, segments_3l, _3o.fdx, _3o_Lucene41_0.doc, _3o.si]
> 194283 [Thread-22] INFO  org.apache.solr.core.SolrCore  û newest commit
> = 129[_3
> o.nvd, _3o_Lucene41_0.tim, _3o.fnm, _3o.nvm, _3o_Lucene41_0.tip,
> _3o.fdt, _3o_Lu
> cene41_0.pos, segments_3l, _3o.fdx, _3o_Lucene41_0.doc, _3o.si]
> 194283 [Thread-22] INFO  org.apache.solr.update.DefaultSolrCoreState  û
> New Inde
> xWriter is ready to be used.
> 194283 [Thread-22] INFO  org.apache.solr.update.UpdateHandler  û
> end_rollback
> 194669 [qtp32398134-13] INFO
> org.apache.solr.handler.dataimport.DataImporter  û
>  Loading DIH Configuration: wordpress-data-config.xml
> 194672 [qtp32398134-13] INFO
> org.apache.solr.handler.dataimport.DataImporter  û
>  Data Configuration loaded successfully
> 194676 [Thread-23] INFO  org.apache.solr.handler.dataimport.DataImporter
> û Star
> ting Full Import
> 194676 [qtp32398134-13] INFO  org.apache.solr.core.SolrCore  û
> [tv-wordpress] we
> bapp=/solr path=/dataimport params={command=full-import} status=0
> QTime=8
> 194680 [Thread-23] INFO
> org.apache.solr.handler.dataimport.SimplePropertiesWrit
> er  û Read dataimport.properties
> 194681 [Thread-23] INFO  org.apache.solr.core.SolrCore  û [tv-wordpress]
> REMOVIN
> G ALL DOCUMENTS FROM INDEX
> 194686 [Thread-23] INFO
> org.apache.solr.handler.dataimport.JdbcDataSource  û Cr
> eating a connection for entity article with URL:
> jdbc:mysql@localhost:3306/wptal
> kman
> 194686 [Thread-23] INFO
> org.apache.solr.handler.dataimport.JdbcDataSource  û Ti
> me taken for getConnection(): 0
> 194687 [Thread-23] ERROR org.apache.solr.handler.dataimport.DocBuilder
> û Except
> ion while processing: article document :
> SolrInputDocument[]:org.apache.solr.han
> dler.dataimport.DataImportHandlerException: Unable to execute query:
> select * fr
> om wp_posts WHERE post_status='publish' Processing Document # 1
> at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
> Throw(DataImportHandlerException.java:71)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<
> init>(JdbcDataSource.java:253)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou
> rce.java:210)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou
> rce.java:38)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEn
> tityProcessor.java:59)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEnti
> tyProcessor.java:73)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
> ityProcessorWrapper.java:243)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildD

Re: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query

2013-12-26 Thread PeterKerk

Solr 4.3.1

When I run the statement in MySQL Workbench or console the statement
executes successfully and returns 2 results.

FYI: I placed the mysql-connector-java-5.1.27-bin.jar in the \lib folder.

Also: it should not throw this error even when 0 results are returned right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/org-apache-solr-handler-dataimport-DataImportHandlerException-Unable-to-execute-query-tp4108227p4108233.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query

2013-12-26 Thread Shalin Shekhar Mangar

I was reading the code and it looks like it could throw an NPE if
java.sql.Statement#execute() returns false which can happen if there
are no results (although most drivers return an empty resultset
instead):

if (stmt.execute(query)) {
  resultSet = stmt.getResultSet();
}
LOG.trace("Time taken for sql :"
+ (System.currentTimeMillis() - start));
colNames = readFieldNames(resultSet.getMetaData());

Can you try using the debug mode and paste its response?

On Thu, Dec 26, 2013 at 7:29 PM, PeterKerk  wrote:
> Solr 4.3.1
>
> When I run the statement in MySQL Workbench or console the statement
> executes successfully and returns 2 results.
>
> FYI: I placed the mysql-connector-java-5.1.27-bin.jar in the \lib folder.
>
> Also: it should not throw this error even when 0 results are returned right?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/org-apache-solr-handler-dataimport-DataImportHandlerException-Unable-to-execute-query-tp4108227p4108233.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.

Chaining plugins

2013-12-26 Thread elmerfudd

I would like to develope a search handler that is doing some logic and then
just sends the query to the default search handler so the results will be
generated there. 
It's like it is a transparent plugin and the data will only go through it.

How can this be achieved .
thanks ahead :) 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Chaining-plugins-tp4108239.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query

2013-12-26 Thread PeterKerk

Shalin Shekhar Mangar wrote
> Can you try using the debug mode and paste its response?

Ok, thanks. How do I enabled and use the debug mode?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/org-apache-solr-handler-dataimport-DataImportHandlerException-Unable-to-execute-query-tp4108227p4108248.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Questions about integrateing SolrCloud with HDFS

2013-12-26 Thread Greg Walters

YouPeng,

While I'm unable to help you with the issue that you're seeing I did want to 
comment here and say that I have previously brought up the same goal that 
you're trying to accomplish on this mailing list but received no feedback or 
input. I think it makes sense that Solr should not try to make its index 
directories distinct and redundant per shard/core while running on HDFS as data 
redundancy and locality is handled at a different layer in the software stack.

+1 to this topic because I'd love to see Solr handle replication/redundancy 
more smartly on HDFS

Thanks,
Greg


On Dec 24, 2013, at 1:57 AM, YouPeng Yang  wrote:

> Hi users
> 
> Solr supports for writing and reading its index and transaction log files
> to the HDFS distributed filesystem.
>  **I am curious about that there are any other futher improvement about
> the integration with HDFS.*
>  **For the solr  native replication  will make multiple copies  of the
> master node's index. Because of the native replication of HDFS,there is no
> need to do that.It just to need that multiple cores in solrcloud share the
> same index directory in HDFS?*
> 
> 
>   The above supposition is what I want to achive when we are integrating
> SolrCloud with HDFS (Solr 4.6).
>   To make sure of our application high available,we still have  to take
> the solr   replication with   some tricks.
> 
>   Firstly ,noting that  solr's index directory is made up of
> *collectionName/coreNodeName/data/index *
> 
> *collectionName/coreNodeName/data/tlog*
>   So to achive this,we want to create multi cores that use the same  hdfs
> index directory .
> 
>  I have tested this  within solr 4.4 by expilcitly indicating  the same
> coreNodeName.
> 
>  For example:
>  Step1, a core was created with the name=core1 and shard=core_shard1 and
> collection=clollection1 and coreNodeName=*core1*
>  Step2. create another core  with the name=core2 and shard=core_shard1 and
> collection=clollection1 and coreNodeName=
> *core1*
> *  T*he two core share the same shard ,collection and coreNodeName.As a
> result,the two core will get the same index data which is stored in the
> hdfs directory :
>  hdfs://myhdfs/*clollection1*/*core1*/data/index
>  hdfs://myhdfs/*clollection1*/*core1*/data/tlog
> 
>  Unfortunately*, *as the solr 4.6 was released,we upgraded . the above
> goal failed. We could not create a core with both expilcit shard and
> coreNodeName.
> Exceptions are as [1].
> *  Can some give some help?*
> 
> 
> Regards
> [1]--
> 64893635 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
> ?.publishing core=hdfstest3 state=down
> 64893635 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
> ?.numShards not found on descriptor - reading it from system property
> 64893698 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
> ?.look for our core node name
> 
> 
> 
> 64951227 [http-bio-8080-exec-17] INFO  org.apache.solr.core.SolrCore
> ?.[reportCore_201208] webapp=/solr path=/replication
> params={slave=false&command=details&wt=javabin&qt=/replication&version=2}
> status=0 QTime=107
> 
> 
> 65213770 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
> ?.waiting to find shard id in clusterstate for hdfstest3
> 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore
> ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore
> 'hdfstest3': Could not get shard id for core: hdfstest3
>at
> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535)
>at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>at
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
>at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
>at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
>at
> org.apache.coyote.http11.

Re: Chaining plugins

2013-12-26 Thread Paul Libbrecht

I have subclassed the query component to do so.
Using params, you can get almost everything thinkable that is not too much 
documented.

paul

On 26 déc. 2013, at 15:59, elmerfudd  wrote:

> I would like to develope a search handler that is doing some logic and then
> just sends the query to the default search handler so the results will be
> generated there. 
> It's like it is a transparent plugin and the data will only go through it.
> 
> How can this be achieved .
> thanks ahead :) 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Chaining-plugins-tp4108239.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Does Solr fork child processes and result in zombies?

2013-12-26 Thread Sir Gilligan

I have three CentOS machines running Solr 4.6.0 cloud without any 
replication. That is, numshards is 3 and there is only one Solr instance 
running on each of the boxes.


Also, on the boxes I arm running ZooKeeper. This is a test environment 
and I would not normally run ZooKeeper on the same boxes.


As I am inserting data into Solr the boxes get in a weird state. I will 
log in and enter my username and password and then nothing, it just sits 
there. I am connected through Putty. Never gets to a command prompt. I 
stop the data import and after a while I can log in.


I do the following command on one of the boxes and I see this:

ps -lf -C java

F S UIDPID  PPID  C PRI  NI ADDR SZ WCHAN  STIME 
TTY  TIME CMD
0 S root  4772 1 99  80   0 - 1926607 futex_ 12:13 pts/0  
213852-21:10:31 java -Dzookeeper.log.dir=. 
-Dzookeeper.root.logger=INFO,CONSOLE -cp 
/var/zookeeper/bin/../build/classes:/var/zookeeper/bin/../build/lib/*.jar:/var/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/var/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/var/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/var/zookeeper/bin/../lib/log4j-1.2.15.jar:/var/zookeeper/bin/../lib/jline-0.9.94.jar:/var/zookeeper/bin/../zookeeper-3.4.5.jar:/var/zookeeper/bin/../src/java/lib/*.jar:/var/zookeeper/bin/../conf: 
-Xms1G -Xmx4G -Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.local.only=false 
org.apache.zookeeper.server.quorum.QuorumPeerMain 
/var/zookeeper/bin/../conf/zoo.cfg
0 S root  5009 1 99  80   0 - 46184325 futex_ 12:26 pts/0 
219341-04:38:50 /usr/bin/java -Dbootstrap_confdir=./solr/mycore/conf 
-Xms6G -Xmx12G -Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.port=3000 
-Dcollection.configName=amtcacheconf 
-DzkHost=prdslslbgtmdb01:2181,prdslslbgtmdb03:2181,prdslslbgtmdb04:2181 
-DnumShards=3 -jar start.jar
1 D root  7879  5009 99  80   0 - 46184325 sched_ 15:40 pts/0 
208-11:14:20 /usr/bin/java -Dbootstrap_confdir=./solr/mycore/conf -Xms6G 
-Xmx12G -Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.port=3000 
-Dcollection.configName=amtcacheconf 
-DzkHost=prdslslbgtmdb01:2181,prdslslbgtmdb03:2181,prdslslbgtmdb04:2181 
-DnumShards=3 -jar start.jar
1 D root  7949  5009 99  80   0 - 46184325 sched_ 15:44 pts/0 
208-11:14:20 /usr/bin/java -Dbootstrap_confdir=./solr/mycore/conf -Xms6G 
-Xmx12G -Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.port=3000 
-Dcollection.configName=amtcacheconf 
-DzkHost=prdslslbgtmdb01:2181,prdslslbgtmdb03:2181,prdslslbgtmdb04:2181 
-DnumShards=3 -jar start.jar



How did I end up with two child processes of Solr running? Notice they 
are two PIDS, 7879 and 7949, that are children of 5009. The exact same 
command as well, with all of the parameters I used to launch Solr.


I also notice the "F" state is "1" for those two processes, so I assume 
that means "forked but didn't exec".


Also the WCHAN is sched_ on both of them.

The "S" state is "D" which means uninterruptible sleep ( usually IO ).

Where are these processes coming from? Do I have something configured 
incorrectly?

Re: Maybe a bug for solr 4.6 when create a new core

2013-12-26 Thread Shawn Heisey

On 12/25/2013 11:29 PM, YouPeng Yang wrote:
>   After I fixed this prolem,I can create a core with the request:
> http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test&;
> *shard=Test*
> &collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol&
> *coreNodeName=Test*

In simple terms, what are you trying to get done?  This is sounding like
an XY problem.

http://people.apache.org/~hossman/#xyproblem

If you take a few steps back and describe what you want to happen, what
has come before, what you've tried, and what has actually happened, it
will be easier to help you.

Most of the people who work on Solr are part of the western world, and
yesterday was Christmas.  Some people are getting back to work today,
but many of them will be unavailable until after the new year.

Thanks,
Shawn

Re: Maybe a bug for solr 4.6 when create a new core

2013-12-26 Thread Mark Miller

If you are seeing an NPE there, sounds like you are on to something. Please 
file a JIRA issue.  

- Mark

> On Dec 26, 2013, at 1:29 AM, YouPeng Yang  wrote:
> 
> Hi
>   Merry Christmas.
> 
>   Before this mail,I am in trouble with a weird problem  for a few days
> when to create a new core with both explicite shard and coreNodeName. And I
> have posted a few mails  in the mailist,no one ever gives any
> suggestions,maybe  they did not  encounter the same problem.
>  I have to go through the srcs to check out the reason. Thanks god, I find
> it. The reason to the problem,maybe be a bug, so I would like to report it
> hoping to get your endorsement and confirmation.
> 
> 
> In class org.apache.solr.cloud.Overseer the Line 360:
> -
>  if (sliceName !=null && collectionExists &&
> !"true".equals(state.getCollection(collection).getStr("autoCreated"))) {
>Slice slice = state.getSlice(collection, sliceName);
>if (slice.getReplica(coreNodeName) == null) {
>  log.info("core_deleted . Just return");
>  return state;
>}
>  }
> -
> the slice needs to be checked null .because I create a new core with both
> explicite shard and coreNodeName, the state.getSlice(collection,
> sliceName)  may return a null.So it needs to be checked ,or there will be
> an NullpointException
> -
>  if (sliceName !=null && collectionExists &&
> !"true".equals(state.getCollection(collection).getStr("autoCreated"))) {
>Slice slice = state.getSlice(collection, sliceName);
>if (*slice != null &&* slice.getReplica(coreNodeName) == null) {
>  log.info("core_deleted . Just return");
>  return state;
>}
>  }
> -
> 
> *Querstion 1*: Is this OK with the whole solr project,I have no aware
> about the influences about the change,as right now ,it goes right. Please
> make confirm about this.
> 
>  After I fixed this prolem,I can create a core with the request:
> http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test&;
> *shard=Test*
> &collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol&
> *coreNodeName=Test*
> 
>  However when I create a replica within the same shard Test:
> http://10.7.23.122:8080/solr/admin/cores?action=CREATE&*name=Test1*&;
> *shard=Test*
> &collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol&
> *coreNodeName=Test1*
> 
>  It response an error:
> 
>
>400
> 29
> 
> 
>  Error CREATEing SolrCore 'Test1': Test1 is
> removed
>  400
> 
> 
> 
> I aslo find the reason the in the class  org.apache.solr.cloud.ZkController
> line 1369~ 1384[1]
> As the src here,it needs to check the autoCreated within an existing
> collection
> when the coreNodeName and shard were assigned manully. the autoCreated
> property of a collection is not equal with true, it throws an exeption.
> 
>  *Question2*: Why does it need  to check the 'autoCreated', and how could
> I go through this check, or Is this another bug?
> 
> 
> 
> 
> [1]-
>try {
>  if(cd.getCloudDescriptor().getCollectionName() !=null &&
> cd.getCloudDescriptor().getCoreNodeName() != null ) {
>//we were already registered
> 
> if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){
>DocCollection coll =
> zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName());
> if(!"true".equals(coll.getStr("autoCreated"))){
>   Slice slice =
> coll.getSlice(cd.getCloudDescriptor().getShardId());
>   if(slice != null){
> if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName())
> == null) {
>   log.info("core_removed This core is removed from ZK");
>   throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName +"
> is removed");
> }
>   }
> }
>}
>  }
> --

Re: Questions about integrateing SolrCloud with HDFS

2013-12-26 Thread Mark Miller

Can you file a JIRA issue?

- Mark

> On Dec 24, 2013, at 2:57 AM, YouPeng Yang  wrote:
> 
> Hi users
> 
> Solr supports for writing and reading its index and transaction log files
> to the HDFS distributed filesystem.
>  **I am curious about that there are any other futher improvement about
> the integration with HDFS.*
>  **For the solr  native replication  will make multiple copies  of the
> master node's index. Because of the native replication of HDFS,there is no
> need to do that.It just to need that multiple cores in solrcloud share the
> same index directory in HDFS?*
> 
> 
>   The above supposition is what I want to achive when we are integrating
> SolrCloud with HDFS (Solr 4.6).
>   To make sure of our application high available,we still have  to take
> the solr   replication with   some tricks.
> 
>   Firstly ,noting that  solr's index directory is made up of
> *collectionName/coreNodeName/data/index *
> 
> *collectionName/coreNodeName/data/tlog*
>   So to achive this,we want to create multi cores that use the same  hdfs
> index directory .
> 
>  I have tested this  within solr 4.4 by expilcitly indicating  the same
> coreNodeName.
> 
>  For example:
>  Step1, a core was created with the name=core1 and shard=core_shard1 and
> collection=clollection1 and coreNodeName=*core1*
>  Step2. create another core  with the name=core2 and shard=core_shard1 and
> collection=clollection1 and coreNodeName=
> *core1*
> *  T*he two core share the same shard ,collection and coreNodeName.As a
> result,the two core will get the same index data which is stored in the
> hdfs directory :
>  hdfs://myhdfs/*clollection1*/*core1*/data/index
>  hdfs://myhdfs/*clollection1*/*core1*/data/tlog
> 
>  Unfortunately*, *as the solr 4.6 was released,we upgraded . the above
> goal failed. We could not create a core with both expilcit shard and
> coreNodeName.
> Exceptions are as [1].
> *  Can some give some help?*
> 
> 
> Regards
> [1]--
> 64893635 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
> ?.publishing core=hdfstest3 state=down
> 64893635 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
> ?.numShards not found on descriptor - reading it from system property
> 64893698 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
> ?.look for our core node name
> 
> 
> 
> 64951227 [http-bio-8080-exec-17] INFO  org.apache.solr.core.SolrCore
> ?.[reportCore_201208] webapp=/solr path=/replication
> params={slave=false&command=details&wt=javabin&qt=/replication&version=2}
> status=0 QTime=107
> 
> 
> 65213770 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
> ?.waiting to find shard id in clusterstate for hdfstest3
> 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore
> ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore
> 'hdfstest3': Could not get shard id for core: hdfstest3
>at
> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535)
>at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>at
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
>at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
>at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
>at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009)
>at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
>at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.solr.common.SolrException: Could not

Re: Questions about integrateing SolrCloud with HDFS

2013-12-26 Thread Mark Miller

Cloudera has plans here. I'll be working on further hdfs / Solrcloud options in 
the near future. 

- Mark

> On Dec 26, 2013, at 11:33 AM, Greg Walters  wrote:
> 
> YouPeng,
> 
> While I'm unable to help you with the issue that you're seeing I did want to 
> comment here and say that I have previously brought up the same goal that 
> you're trying to accomplish on this mailing list but received no feedback or 
> input. I think it makes sense that Solr should not try to make its index 
> directories distinct and redundant per shard/core while running on HDFS as 
> data redundancy and locality is handled at a different layer in the software 
> stack.
> 
> +1 to this topic because I'd love to see Solr handle replication/redundancy 
> more smartly on HDFS
> 
> Thanks,
> Greg
> 
> 
>> On Dec 24, 2013, at 1:57 AM, YouPeng Yang  wrote:
>> 
>> Hi users
>> 
>> Solr supports for writing and reading its index and transaction log files
>> to the HDFS distributed filesystem.
>> **I am curious about that there are any other futher improvement about
>> the integration with HDFS.*
>> **For the solr  native replication  will make multiple copies  of the
>> master node's index. Because of the native replication of HDFS,there is no
>> need to do that.It just to need that multiple cores in solrcloud share the
>> same index directory in HDFS?*
>> 
>> 
>>  The above supposition is what I want to achive when we are integrating
>> SolrCloud with HDFS (Solr 4.6).
>>  To make sure of our application high available,we still have  to take
>> the solr   replication with   some tricks.
>> 
>>  Firstly ,noting that  solr's index directory is made up of
>> *collectionName/coreNodeName/data/index *
>> 
>> *collectionName/coreNodeName/data/tlog*
>>  So to achive this,we want to create multi cores that use the same  hdfs
>> index directory .
>> 
>> I have tested this  within solr 4.4 by expilcitly indicating  the same
>> coreNodeName.
>> 
>> For example:
>> Step1, a core was created with the name=core1 and shard=core_shard1 and
>> collection=clollection1 and coreNodeName=*core1*
>> Step2. create another core  with the name=core2 and shard=core_shard1 and
>> collection=clollection1 and coreNodeName=
>> *core1*
>> *  T*he two core share the same shard ,collection and coreNodeName.As a
>> result,the two core will get the same index data which is stored in the
>> hdfs directory :
>> hdfs://myhdfs/*clollection1*/*core1*/data/index
>> hdfs://myhdfs/*clollection1*/*core1*/data/tlog
>> 
>> Unfortunately*, *as the solr 4.6 was released,we upgraded . the above
>> goal failed. We could not create a core with both expilcit shard and
>> coreNodeName.
>> Exceptions are as [1].
>> *  Can some give some help?*
>> 
>> 
>> Regards
>> [1]--
>> 64893635 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
>> ?.publishing core=hdfstest3 state=down
>> 64893635 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
>> ?.numShards not found on descriptor - reading it from system property
>> 64893698 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
>> ?.look for our core node name
>> 
>> 
>> 
>> 64951227 [http-bio-8080-exec-17] INFO  org.apache.solr.core.SolrCore
>> ?.[reportCore_201208] webapp=/solr path=/replication
>> params={slave=false&command=details&wt=javabin&qt=/replication&version=2}
>> status=0 QTime=107
>> 
>> 
>> 65213770 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
>> ?.waiting to find shard id in clusterstate for hdfstest3
>> 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore
>> ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore
>> 'hdfstest3': Could not get shard id for core: hdfstest3
>>   at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535)
>>   at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)
>>   at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>   at
>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
>>   at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>>   at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
>>   at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>>   at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>>   at
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>>   at
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>>   at
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>>   at
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>>   at
>> org.apache.cat

Re: Bad fieldNorm when using morphologic synonyms

2013-12-26 Thread Isaac Hebsh

Attached patch into the JIRA issue.
Reviews are welcome.


On Thu, Dec 19, 2013 at 7:24 PM, Isaac Hebsh  wrote:

> Roman, do you have any results?
>
> created SOLR-5561
>
> Robert, if I'm wrong, you are welcome to close that issue.
>
>
> On Mon, Dec 9, 2013 at 10:50 PM, Isaac Hebsh wrote:
>
>> You can see the norm value, in the "explain" text, when setting
>> debugQuery=true.
>> If the same item gets different norm before/after, that's it.
>>
>> Note that this configuration is in schema.xml (not solrconfig.xml...)
>>
>> On Monday, December 9, 2013, Roman Chyla wrote:
>>
>>> Isaac, is there an easy way to recognize this problem? We also index
>>> synonym tokens in the same position (like you do, and I'm sure that our
>>> positions are set correctly). I could test whether the default similarity
>>> factory in solrconfig.xml had any effect (before/after reindexing).
>>>
>>> --roman
>>>
>>>
>>> On Mon, Dec 9, 2013 at 2:42 PM, Isaac Hebsh 
>>> wrote:
>>>
>>> > Hi Robert and Manuel.
>>> >
>>> > The DefaultSimilarity indeed sets discountOverlap to true by default.
>>> > BUT, the *factory*, aka DefaultSimilarityFactory, when called by
>>> > IndexSchema (the getSimilarity method), explicitly sets this value to
>>> the
>>> > value of its corresponding class member.
>>> > This class member is initialized to be FALSE  when the instance is
>>> created
>>> > (like every boolean variable in the world). It should be set when
>>> "init"
>>> > method is called. If the parameter is not set in schema.xml, the
>>> default is
>>> > true.
>>> >
>>> > Everything seems to be alright, but the issue is that "init" method is
>>> NOT
>>> > called, if the similarity is not *explicitly* declared in schema.xml.
>>> In
>>> > that case, init method is not called, the discountOverlaps member (of
>>> the
>>> > factory class) remains FALSE, and getSimilarity explicitly calls
>>> > setDiscountOverlaps with value of FALSE.
>>> >
>>> > This is very easy to reproduce and debug.
>>> >
>>> >
>>> > On Mon, Dec 9, 2013 at 9:19 PM, Robert Muir  wrote:
>>> >
>>> > > no, its turned on by default in the default similarity.
>>> > >
>>> > > as i said, all that is necessary is to fix your analyzer to emit the
>>> > > proper position increments.
>>> > >
>>> > > On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand
>>> > >  wrote:
>>> > > > In order to set discountOverlaps to true you must have added the
>>> > > >  to the
>>> schema.xml,
>>> > > which
>>> > > > is commented out by default!
>>> > > >
>>> > > > As by default this param is false, the above situation is expected
>>> with
>>> > > > correct positioning, as said.
>>> > > >
>>> > > > In order to fix the field norms you'd have to reindex with the
>>> > similarity
>>> > > > class which initializes the param to true.
>>> > > >
>>> > > > Cheers,
>>> > > > Manu
>>> > >
>>> >
>>>
>>
>

Re: Questions about integrateing SolrCloud with HDFS

2013-12-26 Thread Greg Walters

Mark,

I'd be happy to but some clarification first; should this issue be about 
creating cores with overlapping names and the stack trace that YouPeng 
initially described, Solr's behavior when storing data on HDFS or YouPeng's 
other thread (Maybe a bug for solr 4.6 when create a new core) that looks like 
it might be a near duplicate of this one?

Thanks,
Greg

On Dec 26, 2013, at 12:40 PM, Mark Miller  wrote:

> Can you file a JIRA issue?
> 
> - Mark
> 
>> On Dec 24, 2013, at 2:57 AM, YouPeng Yang  wrote:
>> 
>> Hi users
>> 
>> Solr supports for writing and reading its index and transaction log files
>> to the HDFS distributed filesystem.
>> **I am curious about that there are any other futher improvement about
>> the integration with HDFS.*
>> **For the solr  native replication  will make multiple copies  of the
>> master node's index. Because of the native replication of HDFS,there is no
>> need to do that.It just to need that multiple cores in solrcloud share the
>> same index directory in HDFS?*
>> 
>> 
>>  The above supposition is what I want to achive when we are integrating
>> SolrCloud with HDFS (Solr 4.6).
>>  To make sure of our application high available,we still have  to take
>> the solr   replication with   some tricks.
>> 
>>  Firstly ,noting that  solr's index directory is made up of
>> *collectionName/coreNodeName/data/index *
>> 
>> *collectionName/coreNodeName/data/tlog*
>>  So to achive this,we want to create multi cores that use the same  hdfs
>> index directory .
>> 
>> I have tested this  within solr 4.4 by expilcitly indicating  the same
>> coreNodeName.
>> 
>> For example:
>> Step1, a core was created with the name=core1 and shard=core_shard1 and
>> collection=clollection1 and coreNodeName=*core1*
>> Step2. create another core  with the name=core2 and shard=core_shard1 and
>> collection=clollection1 and coreNodeName=
>> *core1*
>> *  T*he two core share the same shard ,collection and coreNodeName.As a
>> result,the two core will get the same index data which is stored in the
>> hdfs directory :
>> hdfs://myhdfs/*clollection1*/*core1*/data/index
>> hdfs://myhdfs/*clollection1*/*core1*/data/tlog
>> 
>> Unfortunately*, *as the solr 4.6 was released,we upgraded . the above
>> goal failed. We could not create a core with both expilcit shard and
>> coreNodeName.
>> Exceptions are as [1].
>> *  Can some give some help?*
>> 
>> 
>> Regards
>> [1]--
>> 64893635 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
>> ?.publishing core=hdfstest3 state=down
>> 64893635 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
>> ?.numShards not found on descriptor - reading it from system property
>> 64893698 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
>> ?.look for our core node name
>> 
>> 
>> 
>> 64951227 [http-bio-8080-exec-17] INFO  org.apache.solr.core.SolrCore
>> ?.[reportCore_201208] webapp=/solr path=/replication
>> params={slave=false&command=details&wt=javabin&qt=/replication&version=2}
>> status=0 QTime=107
>> 
>> 
>> 65213770 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
>> ?.waiting to find shard id in clusterstate for hdfstest3
>> 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore
>> ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore
>> 'hdfstest3': Could not get shard id for core: hdfstest3
>>   at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535)
>>   at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)
>>   at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>   at
>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
>>   at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>>   at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
>>   at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>>   at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>>   at
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>>   at
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>>   at
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>>   at
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>>   at
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
>>   at
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>>   at
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
>>   at
>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Proce

Re: Does Solr fork child processes and result in zombies?

2013-12-26 Thread Shawn Heisey

On 12/26/2013 9:56 AM, Sir Gilligan wrote:
> I have three CentOS machines running Solr 4.6.0 cloud without any
> replication. That is, numshards is 3 and there is only one Solr instance
> running on each of the boxes.
> 
> Also, on the boxes I arm running ZooKeeper. This is a test environment
> and I would not normally run ZooKeeper on the same boxes.
> 
> As I am inserting data into Solr the boxes get in a weird state. I will
> log in and enter my username and password and then nothing, it just sits
> there. I am connected through Putty. Never gets to a command prompt. I
> stop the data import and after a while I can log in.
> 
> I do the following command on one of the boxes and I see this:
> 
> ps -lf -C java

> How did I end up with two child processes of Solr running? Notice they
> are two PIDS, 7879 and 7949, that are children of 5009. The exact same
> command as well, with all of the parameters I used to launch Solr.
> 
> I also notice the "F" state is "1" for those two processes, so I assume
> that means "forked but didn't exec".
> 
> Also the WCHAN is sched_ on both of them.
> 
> The "S" state is "D" which means uninterruptible sleep ( usually IO ).
> 
> Where are these processes coming from? Do I have something configured
> incorrectly?

Solr itself should not fork processes, or at least I have never seen it
do so.  It does appear that you are using 'start.jar' which suggests
that you're using the Jetty that comes bundled with Solr, although I
cannot tell that for sure.  If you are using some other container
(including another version/copy of Jetty), then I have no idea what it
might do.

I ran the same ps command on one of my CentOS 6 SolrCloud (4.2.1)
machines and I get exactly two entries - one for zookeeper and one for
Solr (running the included Jetty).  If on the other hand I run a ps
command that shows threads, I see a LOT of entries for both zookeeper
and java, because these are highly threaded applications.  I have a much
larger Solr install that's not using SolrCloud, and I have never seen it
fork processes either.  My dev install (running 4.6.0 in non-cloud mode)
also doesn't fork processes.

Side notes:

As long as the machine has enough resources available, running zookeeper
on the same boxes as Solr shouldn't pose a problem.  If the machine
becomes heavily I/O bound and zookeeper data is not on separate
spindles, it might be a problem.

The bootstrap options are not meant to run on every startup.  They
should not be used except when first converting a non-cloud install to a
cloud install.  If you want to upload a new configuration to zookeeper,
you can use the zkCli script in cloud-scripts and then reload your
collection.  Also, I think it's generally not a good idea to use the
numShards startup parameter.  You can indicate the number of shards for
a collection when you create the collection.

With a 12GB heap, you're definitely going to want to tune your garbage
collection.  I don't see an tuning parameters on your commandline.  I'd
like to avoid a religious garbage collection flame-war, so I will give
you the settings that work for me and allow you to decide for yourself
what to do:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Here's some more generic information about performance problems with Solr:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Re: adding a node to SolrCloud

2013-12-26 Thread David Santamauro



On 12/23/2013 05:43 PM, Greg Preston wrote:

I believe you can just define multiple cores:



...

(this is the old style solr.xml.  I don't know how to do it in the newer style)


Yes, that is exactly what I did but somehow, the link between shards and 
collections gets lost and everything gets very confused.


I guess I should have read more carefully about the valid parameters on 
the  element. My problem was a missing attribute:


  @collection="collection-name"

So the complete core definition that survives tomcat restarts:

 


David

Re: Questions about integrateing SolrCloud with HDFS

2013-12-26 Thread Mark Miller

1. The exception and change in experience on the move to 4.6 seems like it 
could be a bug we want to investigate. 

2. Solr storing data on hdfs in other ways seems like a different issue / 
improvement. 

3. You shouldn't try and force more than one core to use the same index on 
hdfs. This would be bad. 

4. You really want to use the solr.hdfs.home setting described in the 
documentation IMO. 

- Mark

> On Dec 26, 2013, at 1:56 PM, Greg Walters  wrote:
> 
> Mark,
> 
> I'd be happy to but some clarification first; should this issue be about 
> creating cores with overlapping names and the stack trace that YouPeng 
> initially described, Solr's behavior when storing data on HDFS or YouPeng's 
> other thread (Maybe a bug for solr 4.6 when create a new core) that looks 
> like it might be a near duplicate of this one?
> 
> Thanks,
> Greg
> 
>> On Dec 26, 2013, at 12:40 PM, Mark Miller  wrote:
>> 
>> Can you file a JIRA issue?
>> 
>> - Mark
>> 
>>> On Dec 24, 2013, at 2:57 AM, YouPeng Yang  wrote:
>>> 
>>> Hi users
>>> 
>>> Solr supports for writing and reading its index and transaction log files
>>> to the HDFS distributed filesystem.
>>> **I am curious about that there are any other futher improvement about
>>> the integration with HDFS.*
>>> **For the solr  native replication  will make multiple copies  of the
>>> master node's index. Because of the native replication of HDFS,there is no
>>> need to do that.It just to need that multiple cores in solrcloud share the
>>> same index directory in HDFS?*
>>> 
>>> 
>>> The above supposition is what I want to achive when we are integrating
>>> SolrCloud with HDFS (Solr 4.6).
>>> To make sure of our application high available,we still have  to take
>>> the solr   replication with   some tricks.
>>> 
>>> Firstly ,noting that  solr's index directory is made up of
>>> *collectionName/coreNodeName/data/index *
>>> 
>>> *collectionName/coreNodeName/data/tlog*
>>> So to achive this,we want to create multi cores that use the same  hdfs
>>> index directory .
>>> 
>>> I have tested this  within solr 4.4 by expilcitly indicating  the same
>>> coreNodeName.
>>> 
>>> For example:
>>> Step1, a core was created with the name=core1 and shard=core_shard1 and
>>> collection=clollection1 and coreNodeName=*core1*
>>> Step2. create another core  with the name=core2 and shard=core_shard1 and
>>> collection=clollection1 and coreNodeName=
>>> *core1*
>>> *  T*he two core share the same shard ,collection and coreNodeName.As a
>>> result,the two core will get the same index data which is stored in the
>>> hdfs directory :
>>> hdfs://myhdfs/*clollection1*/*core1*/data/index
>>> hdfs://myhdfs/*clollection1*/*core1*/data/tlog
>>> 
>>> Unfortunately*, *as the solr 4.6 was released,we upgraded . the above
>>> goal failed. We could not create a core with both expilcit shard and
>>> coreNodeName.
>>> Exceptions are as [1].
>>> *  Can some give some help?*
>>> 
>>> 
>>> Regards
>>> [1]--
>>> 64893635 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
>>> ?.publishing core=hdfstest3 state=down
>>> 64893635 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
>>> ?.numShards not found on descriptor - reading it from system property
>>> 64893698 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
>>> ?.look for our core node name
>>> 
>>> 
>>> 
>>> 64951227 [http-bio-8080-exec-17] INFO  org.apache.solr.core.SolrCore
>>> ?.[reportCore_201208] webapp=/solr path=/replication
>>> params={slave=false&command=details&wt=javabin&qt=/replication&version=2}
>>> status=0 QTime=107
>>> 
>>> 
>>> 65213770 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
>>> ?.waiting to find shard id in clusterstate for hdfstest3
>>> 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore
>>> ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore
>>> 'hdfstest3': Could not get shard id for core: hdfstest3
>>>  at
>>> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535)
>>>  at
>>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)
>>>  at
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>>  at
>>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
>>>  at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>>>  at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
>>>  at
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>>>  at
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>>>  at
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>>>  at
>>> org.apache.catalina.core.StandardContext

Re: adding a node to SolrCloud

2013-12-26 Thread Shawn Heisey

On 12/24/2013 8:35 AM, David Santamauro wrote:
>> You may have one or more of the SolrCloud 'bootstrap' options on the
>> startup commandline.  The bootstrap options are intended to be used
>> once, in order to bootstrap from a non-SolrCloud setup to a SolrCloud
>> setup.
> 
> No, no unnecessary options. I manually bootstrapped a common config.

I have no idea what might be wrong here.

>> Between the Collections API and the CoreAdmin API, you should never need
>> to edit solr.xml (if using the pre-4.4 format) or core.properties files
>> (if using core discovery, available 4.4 and later) directly.
> 
> Now this I don't understand. If I have created cores through the
> CoreAdmin API, how is solr.xml affected? If I don't edit it, how does
> SOLR know what cores it has to expose to a distributed collection?

If you are using the old-style solr.xml (which will be supported through
all future 4.x versions, but not 5.0), then core definitions are stored
in solr.xml and the contents of the file are changed by many of the
CoreAdmin API actions.  The Collections API calls the CoreAdmin API on
servers throughout the cloud.

http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

If you are using the core discovery format, which was made available in
working form in version 4.4, then solr.xml does NOT contain core
definitions.  The main example in 4.4 and later uses the new format.
Cores are discovered at Solr startup by crawling the filesystem from a
root starting point looking for core.properties files.  In this mode,
solr.xml is fairly static.

http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond
http://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29

Thanks,
Shawn

Re: Boosting results on value of field different from query

2013-12-26 Thread Kydryavtsev Andrey

Hi, Puneet

I think you can try of provided advice from there : 
http://wiki.apache.org/solr/SolrRelevancyFAQ 

Like this one  : 
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_increase_the_score_for_specific_documents
 : set index time boos "per document", so set big boos for documents with 
type:compact and type:sedan

Or this one 
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_change_the_score_of_a_document_based_on_the_.2Avalue.2A_of_a_field_.28say.2C_.22popularity.22.29
 and  
http://wiki.apache.org/solr/ExtendedDisMax#bf_.28Boost_Function.2C_additive.29  
 : use query time function for boosting, you can implement your own function 
query, called for example "typeBoosting", which will convert "type" value per 
document from string into boost number and use it like "typeBoosting(type)".

26.12.2013, 06:28, "Puneet Pawaia" :
> Hi Manju
> Would this query not be searching for and thus restricting results to type
> sedan and compact?
> I would like the results to include other types but only show up lower down
> the list.
> Regards
> Puneet
> On 26 Dec 2013 07:15, "manju16832003"  wrote:
>
>>  Hi Puneet,
>>  if you type field is pre-determined text field ex type [compact, sedan,
>>  hatchback], I think you have to boost with query type field (q) to
>>  get more accurate boosting.
>>
>>  Ex: http://localhost:8983/solr/my/select?q=type:sedan^100 type:compact^10
>>
>>  
>> (:*)^1&wt=json&indent=true&fl=,score&debug=results&bf=recip(rord(publish_date),1,2,3)^1.5&sort=score
>>  desc
>>
>>  For publish_date, replace with the date you use for getting latest
>>  resultes.
>>
>>  In the above query, things to note is that
>>   - fl=,score -> The result set would display score value for each document
>>   - sort by score as first sort field that will give you the documents with
>>  the highest boost value (score) on top
>>
>>   Play around with the boosting values ^100 ^10 (perhaps 5,10,20 ) and
>>  observe how the score value will change the documents.
>>
>>   I'm not really sure how solr calculation works, however the above query
>>  must give you the accurate boosted documents.
>>
>>  --
>>  View this message in context:
>>  
>> http://lucene.472066.n3.nabble.com/Boosting-results-on-value-of-field-different-from-query-tp4108180p4108190.html
>>  Sent from the Solr - User mailing list archive at Nabble.com.

Re: adding a node to SolrCloud

2013-12-26 Thread David Santamauro


On 12/26/2013 02:29 PM, Shawn Heisey wrote:

On 12/24/2013 8:35 AM, David Santamauro wrote:

You may have one or more of the SolrCloud 'bootstrap' options on the
startup commandline.  The bootstrap options are intended to be used
once, in order to bootstrap from a non-SolrCloud setup to a SolrCloud
setup.


No, no unnecessary options. I manually bootstrapped a common config.


I have no idea what might be wrong here.


Between the Collections API and the CoreAdmin API, you should never need
to edit solr.xml (if using the pre-4.4 format) or core.properties files
(if using core discovery, available 4.4 and later) directly.


Now this I don't understand. If I have created cores through the
CoreAdmin API, how is solr.xml affected? If I don't edit it, how does
SOLR know what cores it has to expose to a distributed collection?


If you are using the old-style solr.xml (which will be supported through
all future 4.x versions, but not 5.0), then core definitions are stored
in solr.xml and the contents of the file are changed by many of the
CoreAdmin API actions.  The Collections API calls the CoreAdmin API on
servers throughout the cloud.


I have never experienced tomcat or the SOLR webapp create, modify or 
otherwise touch in anyway the solr.xml file. I have always had to add 
the necessary core definition manually.



http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

If you are using the core discovery format, which was made available in
working form in version 4.4, then solr.xml does NOT contain core
definitions.  The main example in 4.4 and later uses the new format.
Cores are discovered at Solr startup by crawling the filesystem from a
root starting point looking for core.properties files.  In this mode,
solr.xml is fairly static.

http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond
http://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29


I'll begin exploring this new format, thanks for the help and links.

David

Excluding terms in grouping results

2013-12-26 Thread Anze

Hello there,

The question is: how to group results by some field (text terms), but
exclude some term from being grouped by. For example there are few documents
with field 'tags':

1. tags: term1 term2 term3
2. tags: term2 term3 term4
3. tags: term1 term2 term4
3. tags: term2 term3 term4

And i want to group by 'tags', so there usually result would be 4 groups,
but, for example i need to exclude 'term4' as group, but still be able to
see documents where 'term4' is present. Is there a way to do this? I can't
use another field, cause there is a lot of these terms, and any term can be
excluded, or even two at once. Another example, may be it help, when i make
request to find ?q=tags:term4, i need group by tags, but exclude term4 to be
a group, as i already searching by this term.

Thank you for your time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Excluding-terms-in-grouping-results-tp4108280.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Chaining plugins

2013-12-26 Thread Kydryavtsev Andrey

If I get elmer fudd's question correct, he needs something like creating his 
own component which will extends SearchComponent and do some logic in prepare 
method - change input request params probably. Then register this component in 
solrconfig and set it's for default search handler just before query component 
like:

  newComponentName
  query
  facet
  mlt
  highlight
  debug

Lucene's search in query component will be executed with modified parameters.

26.12.2013, 20:55, "Paul Libbrecht" :
> I have subclassed the query component to do so.
> Using params, you can get almost everything thinkable that is not too much 
> documented.
>
> paul
>
> On 26 déc. 2013, at 15:59, elmerfudd  wrote:
>
>>  I would like to develope a search handler that is doing some logic and then
>>  just sends the query to the default search handler so the results will be
>>  generated there.
>>  It's like it is a transparent plugin and the data will only go through it.
>>
>>  How can this be achieved .
>>  thanks ahead :)
>>
>>  --
>>  View this message in context: 
>> http://lucene.472066.n3.nabble.com/Chaining-plugins-tp4108239.html
>>  Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Query Slowliness

2013-12-26 Thread Rafał Kuć

Hello!

It seems that the number of queries per second generated by your
scripts may be too much for your Solr cluster to handle with the
latency you want.

Try launching your scripts one by one and see what is the bottle neck
with your instance. I assume that for some number of scripts running
at the same time you will have good performance and it will start to
degrade after you start adding even more.

If you don't have high commit rate and you don't need NRT, disabling
the caches shouldn't be needed and they can help with query
performance.

Also there are tools our there that can help you diagnose what the
actual problem is, for example (http://sematext.com/spm/index.html). 

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


> This an example of a query:

> http://myip:8080/solr/TestCatMatch_shard12_replica1/select?q=Royal+Cashmere+RC+106+CS+Silk+Cashmere+V+Neck+Moss+Green+Men
> ^10+s+Sweater+Cashmere^3+Men^3+Sweaters^3+Clothing^3&rows=1&wt=json&indent=true

> in return :

> {
>   "responseHeader":{
> "status":0,
> "QTime":191},
>  
> "response":{"numFound":4539784,"start":0,"maxScore":2.0123534,"docs":[
>   {
> "Sections":"fashion",
> "IdsCategories":"11101911",
> "IdProduct":"ef6b8d7cf8340d0c8935727a07baebab",
> "Id":"11101911-ef6b8d7cf8340d0c8935727a07baebab",
> "Name":"Uniqlo Men Cashmere V Neck Sweater Men Clothing
> Sweaters Cashmere",
> "_version_":1455419757424541696}]
>   }}

> This query was executed when no script is running so the QTime is only
> 191 ms, but it may take up to 3s when they are)


> Of course it can be smaller or bigger and of course that affects the
> execution time (the execution times I spoke of are the internal ones
> returned by solr, not calculated by me).

> And yes the CPU is fully used.


> 2013/12/26 Rafał Kuć 

>> Hello!
>>
>> Different queries can have different execution time, that's why I
>> asked about the details. When running the scripts, is Solr CPU fully
>> utilized? To tell more I would like to see what queries are run
>> against Solr from scripts.
>>
>> Do you have any information on network throughput between the server
>> you are running scripts on and the Solr cluster? You wrote that the
>> scripts are fine for 5 seconds and than they get slow. If your Solr
>> cluster is not fully utilized I would take a look at the queries and
>> what they return (ie. using faceting with facet.limit=-1) and seeing
>> if the network is able to process those.
>>
>> --
>> Regards,
>>  Rafał Kuć
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> > Thanks Rafal for your reply,
>>
>> > My scripts are running on other independent machines so they does not
>> > affect Solr, I did mention that the queries are not the same (that is
>> why I
>> > removed the query cache from solrconfig.xml), and I only get 1 result
>> from
>> > Solr (which is the top scored one so no sorting since it is by default
>> > ordred by score)
>>
>>
>>
>> > 2013/12/26 Rafał Kuć 
>>
>> >> Hello!
>> >>
>> >> Could you tell us more about your scripts? What they do? If the
>> >> queries are the same? How many results you fetch with your scripts and
>> >> so on.
>> >>
>> >> --
>> >> Regards,
>> >>  Rafał Kuć
>> >> Performance Monitoring * Log Analytics * Search Analytics
>> >> Solr & Elasticsearch Support * http://sematext.com/
>> >>
>> >>
>> >> > Hi all,
>> >>
>> >> > I have multiple python scripts querying solr with the sunburnt module.
>> >>
>> >> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
>> >> memory
>> >> > & 840 GB storage) and contained several cores for different usage.
>> >>
>> >> > When I manually executed a query through Solr Admin (a query
>> containing
>> >> > 10~15 terms, with some of them having boosts over one field and
>> limited
>> >> to
>> >> > one result without any sorting or faceting etc ) it takes around
>> 700
>> >> > ms, and the Core contained 7 million documents.
>> >>
>> >> > When the scripts are executed things get slower, my query takes 7~10s.
>> >>
>> >> > Then what I did is to turn to SolrCloud expecting huge performance
>> >> increase.
>> >>
>> >> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8
>> vCPU
>> >> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one
>> >> collection
>> >> > to contain the core I was querying, I sharded it to 25 shards (each
>> node
>> >> > containing 5 shards without replication), each shards took 54 MB of
>> >> storage.
>> >>
>> >> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase
>> wich
>> >> > is very good !
>> >>
>> >> > Tested my scripts again (I have 30 scripts running at the same time),
>> and
>> >> > as a surprise, things run fast for 5 seconds then it turns realy slow
>> >> again
>> >> > (query time ).
>> >>
>> >> > I updated the solrc

RE: Unable to check Solr 4.6 SPLITSHARD command progress

2013-12-26 Thread Craig Christman - US

We ran into this exact scenario and resolved by applying SOLR-5214 
(://issues.apache.org/jira/browse/SOLR-5214)

From: binit [b.initth...@gmail.com]
Sent: Friday, December 13, 2013 10:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Unable to check Solr 4.6 SPLITSHARD command progress

Yes, and my clusterstate.json is still:
==
"shards":{
  "shard1":{
"range":"8000-7fff",
"state":"active",
"replicas":{"core_node1":{
"state":"active",
"base_url":"http://./solr";,
"core":".._shard1_replica1",
"node_name":":8080_solr",
"leader":"true"}}},
  "shard1_1":{
"range":"0-7fff",
"state":"construction",
"parent":"shard1",
"replicas":{"core_node2":{
"state":"active",
"base_url":"http://:8080/solr";,
"core":".._shard1_1_replica1",
"node_name":":8080_solr",
"leader":"true"}}},
  "shard1_0":{
"range":"8000-",
"state":"construction",
"parent":"shard1",
"replicas":{"core_node3":{
"state":"active",
"base_url":"http://:8080/solr";,
"core":".._shard1_0_replica1",
"node_name":":8080_solr",
"leader":"true",
"maxShardsPerNode":"1",
"router":{"name":"compositeId"},
"replicationFactor":"1"}}
==

But, it got failed finally with out of memory.
And is definitely not progressing because the thread is stopped. Probably
SPLITSHARD is not mature enough to use yet.

Now, I've no choice but to do it from Solrj, indexing manually.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-check-Solr-4-6-SPLITSHARD-command-progress-tp4106520p4106699.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-26 Thread Greg Preston

Does anybody with knowledge of solr internals know why I'm seeing
instances of Lucene42DocValuesProducer when I don't have any fields
that are using DocValues?  Or am I misunderstanding what this class is
for?

-Greg


On Mon, Dec 23, 2013 at 12:07 PM, Greg Preston
 wrote:
> Hello,
>
> I'm loading up our solr cloud with data (from a solrj client) and
> running into a weird memory issue.  I can reliably reproduce the
> problem.
>
> - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
> - 24 solr nodes (one shard each), spread across 3 physical hosts, each
> host has 256G of memory
> - index and tlogs on ssd
> - Xmx=7G, G1GC
> - Java 1.7.0_25
> - schema and solrconfig.xml attached
>
> I'm using composite routing to route documents with the same clientId
> to the same shard.  After several hours of indexing, I occasionally
> see an IndexWriter go OOM.  I think that's a symptom.  When that
> happens, indexing continues, and that node's tlog starts to grow.
> When I notice this, I stop indexing, and bounce the problem node.
> That's where it gets interesting.
>
> Upon bouncing, the tlog replays, and then segments merge.  Once the
> merging is complete, the heap is fairly full, and forced full GC only
> helps a little.  But if I then bounce the node again, the heap usage
> goes way down, and stays low until the next segment merge.  I believe
> segment merges are also what causes the original OOM.
>
> More details:
>
> Index on disk for this node is ~13G, tlog is ~2.5G.
> See attached mem1.png.  This is a jconsole view of the heap during the
> following:
>
> (Solr cloud node started at the left edge of this graph)
>
> A) One CPU core pegged at 100%.  Thread dump shows:
> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
> nid=0x7a74 runnable [0x7f5a41c5f000]
>java.lang.Thread.State: RUNNABLE
> at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
> at 
> org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
> at 
> org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
> at 
> org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
> at 
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
> at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>
> B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
> memory freed.  Thread dump shows:
> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
> nid=0x7a74 runnable [0x7f5a41c5f000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
> at 
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
> at 
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
> at 
> org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
> at 
> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
> at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>
> C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
> freed.  Thread dump shows:
> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
> nid=0x7a74 runnable [0x7f5a41c5f000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
> at 
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108)
> at 
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
> at 
> org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
> at 
> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
> at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
> at o

Re: Solr Query Slowliness

2013-12-26 Thread Shawn Heisey

On 12/26/2013 3:38 AM, Jilal Oussama wrote:
> Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB memory
> & 840 GB storage) and contained several cores for different usage.
> 
> When I manually executed a query through Solr Admin (a query containing
> 10~15 terms, with some of them having boosts over one field and limited to
> one result without any sorting or faceting etc ) it takes around 700
> ms, and the Core contained 7 million documents.
> 
> When the scripts are executed things get slower, my query takes 7~10s.
> 
> Then what I did is to turn to SolrCloud expecting huge performance increase.
> 
> I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
> with 28 ECU, 15 GB memory & 160 SSD storage), then I created one collection
> to contain the core I was querying, I sharded it to 25 shards (each node
> containing 5 shards without replication), each shards took 54 MB of storage.
> 
> Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
> is very good !
> 
> Tested my scripts again (I have 30 scripts running at the same time), and
> as a surprise, things run fast for 5 seconds then it turns realy slow again
> (query time ).
> 
> I updated the solrconfig.xml to remove the query caches (I don't need them
> since queries are very different and only 1 time queries) and changes the
> index memory to 1 GB, but only got a small increase (3~4s for each query ?!)

Your SolrCloud setup has 35 times as much CPU power (just basing this on
the ECU numbers) as your single-server setup, ten times as much memory,
and a lot more IOPS because you moved to SSD.  A 10X increase in single
query performance is not surprising.

You have not indicated how much memory is assigned to the java heap on
each server.  I think that there are three possible problems happening
here, with a strong possibility that the third one is happening at the
same time as one of the other two:

1) Full garbage collections are too frequent because the heap is too small.
2) Garbage collections take too long because the heap is very large and
GC is not tuned.
3) Extremely high disk I/O because the OS disk cache is too small for
the index size.

Some information on these that might be helpful:

http://wiki.apache.org/solr/SolrPerformanceProblems

The general solution for good Solr performance is to throw hardware,
especially memory, at the problem.  It's worth pointing out that any
level of hardware investment has an upper limit on the total query
volume it can support.  Running 30 test scripts at the same time will be
difficult for all but the most powerful and expensive hardware to deal
with, especially if every query is different.  A five-server cloud where
each server has 8 CPU cores and 15GB of memory is pretty small, all
things considered.

Thanks,
Shawn

Re: Maybe a bug for solr 4.6 when create a new core

2013-12-26 Thread YouPeng Yang

Hi Mark.

   Thanks for your reply.

I will file a JIRA issue about the NPE.

   By the way,would you look through the Question 2. After I create a new
core with explicite shard and coreNodeName successfully,I can not create a
replica for above new core also with explicite coreNodeName and the same
shard and collection
  Request url as following:

http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test1&shard=Test&collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol&coreNodeName=Test1

 It responses an error:


   400
 29
  
  
 Error CREATEing SolrCore 'Test1': Test1 is
removed
 400
  
  

   I find out that in the src class in  org.apache.solr.cloud.
ZkController line 1369~ 1384:
   As the code says,when I indicate a  coreNodeName and collection
explicitly,it goes to check a 'autoCreated' property of the Collection
which I have already created.

  My question :Why does it need to check the 'autoCreated' property,any
jira about this 'autoCreated' property? How can I make  through the check?


[1]-
try {
  if(cd.getCloudDescriptor().getCollectionName() !=null &&
 cd.getCloudDescriptor().getCoreNodeName() != null ) {
//we were already registered

 
if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){
DocCollection coll =
 
zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName());
 if(!"true".equals(coll.getStr("autoCreated"))){
   Slice slice =
 coll.getSlice(cd.getCloudDescriptor().getShardId());
   if(slice != null){
 if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName())
 == null) {
   log.info("core_removed This core is removed from ZK");
   throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName +"
 is removed");
 }
   }
 }
}
  }
 
--


Regards


2013/12/27 Mark Miller 

> If you are seeing an NPE there, sounds like you are on to something.
> Please file a JIRA issue.
>
> - Mark
>
> > On Dec 26, 2013, at 1:29 AM, YouPeng Yang 
> wrote:
> >
> > Hi
> >   Merry Christmas.
> >
> >   Before this mail,I am in trouble with a weird problem  for a few days
> > when to create a new core with both explicite shard and coreNodeName.
> And I
> > have posted a few mails  in the mailist,no one ever gives any
> > suggestions,maybe  they did not  encounter the same problem.
> >  I have to go through the srcs to check out the reason. Thanks god, I
> find
> > it. The reason to the problem,maybe be a bug, so I would like to report
> it
> > hoping to get your endorsement and confirmation.
> >
> >
> > In class org.apache.solr.cloud.Overseer the Line 360:
> > -
> >  if (sliceName !=null && collectionExists &&
> > !"true".equals(state.getCollection(collection).getStr("autoCreated"))) {
> >Slice slice = state.getSlice(collection, sliceName);
> >if (slice.getReplica(coreNodeName) == null) {
> >  log.info("core_deleted . Just return");
> >  return state;
> >}
> >  }
> > -
> > the slice needs to be checked null .because I create a new core with both
> > explicite shard and coreNodeName, the state.getSlice(collection,
> > sliceName)  may return a null.So it needs to be checked ,or there will be
> > an NullpointException
> > -
> >  if (sliceName !=null && collectionExists &&
> > !"true".equals(state.getCollection(collection).getStr("autoCreated"))) {
> >Slice slice = state.getSlice(collection, sliceName);
> >if (*slice != null &&* slice.getReplica(coreNodeName) ==
> null) {
> >  log.info("core_deleted . Just return");
> >  return state;
> >}
> >  }
> > -
> >
> > *Querstion 1*: Is this OK with the whole solr project,I have no aware
> > about the influences about the change,as right now ,it goes right. Please
> > make confirm about this.
> >
> >  After I fixed this prolem,I can create a core with the request:
> > http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test&;
> > *shard=Test*
> >
> &collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol&
> > *coreNodeName=Test*
> >
> >  However when I create a replica within the same shard Test:
> > http://10.7.23.122:8080/solr/admin/cores?action=CREATE&*name=Test1*&;
> > *shard=Test*
> >
> &collection

Re: Solr - Match whole word only in text fields

2013-12-26 Thread Kydryavtsev Andrey

Hi everybody!

Ahmet, do I get it correct - if I use this text_char_norm field type, for input 
"myName=aaa bbb" I'll index terms "myName", "aaa", "bbb"? So I'll match with 
query like "myName" or query like  "bbb", but not match with "myName aaa". I 
can use this type for query value, so split "myName aaa" into ( "myName" && 
"aaa") - and it will work. But this approach will give false positive match 
with "myName bbb". What do you think, how I can handle this? One of the  
approaches is to use in this field type KeywordTokenizer+ShingleFilter instead 
of WhitespaceTokenizerFactory, so tokens like "myName", "myName aaa", "myName 
aaa bbb", "aaa", "aaa bbb", "bbb" will be indexed, but it significantly 
increased index size in case of long values. 

26.12.2013, 03:20, "Ahmet Arslan" :
> Hi Haya,
>
> With MappingCharFilter you can have full control over character set that you 
> want to split.
>
> in mappings.txt you will have
>
> ":" => " "
> "=" => " "
>
> Use the following type and see if it suits for your needs. Update 
> mappings.txt according to your needs.
>
>      positionIncrementGap="100" >
>   
>      mapping="mappings.txt"/>
>     
>     
>   
>     
>
> On Sunday, December 22, 2013 9:19 PM, haya.axelrod  
> wrote:
> I have a text field that can contain very long values (like text files). I
> want to create field type for it (text, not string), in order to have
> something like "Match whole word only" in notepad++, but the delimiter
> should not be only white spaces. If i have:
>
> myName=aaa bbb
>
> I would like to get it for the following search strings "aaa", "bbb", "aaa
> bbb", "myName=aaa bbb", "myName", but not for "aa" or "ame=a" or "a bb".
> Another example is:
>
> aaa bbb
> Can i do this somehow?
>
> What should be my field type definition?
>
> The text can contain any character. Before search i'm escaping the search
> string using
> http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html
>
> Thanks
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Match-whole-word-only-in-text-fields-tp4107795.html
> Sent from the Solr - User mailing list archive at Nabble.com.

REYPLAY_ERR: IOException reading log

2013-12-26 Thread YouPeng Yang

Hi users
  I have build a SolrCloud  on tomcat.The cloud contains 22 shards with no
replica.Also  the the solrcloud is integrated with HDFS.

   After imported data for oracle to the solrcloud, I restart the tomcat
,it does not comes alive againt.
   It always give an exceptions.

   I'm really have not aware about this excetion. Because My schema do not
contains a BigDecimal type field.

   Could you give any tips?

746635 [recoveryExecutor-44-thread-1] WARN
org.apache.solr.update.UpdateLog  – REYPLAY_ERR: IOException reading log
org.apache.solr.common.SolrException: Invalid Number:
java.math.BigDecimal:238088174
at
org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396)
at
org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:98)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:582)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1313)
at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1202)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
746681 [recoveryExecutor-44-thread-1] WARN
org.apache.solr.update.UpdateLog  – REYPLAY_ERR: IOException reading log
org.apache.solr.common.SolrException: Invalid Number:
java.math.BigDecimal:238088175

Re: Maybe a bug for solr 4.6 when create a new core

2013-12-26 Thread YouPeng Yang

 Hi Mark

  I have filed a jira about the NPE:
  https://issues.apache.org/jira/browse/SOLR-5580


2013/12/27 YouPeng Yang 

> Hi Mark.
>
>Thanks for your reply.
>
> I will file a JIRA issue about the NPE.
>
>By the way,would you look through the Question 2. After I create a new
> core with explicite shard and coreNodeName successfully,I can not create a
> replica for above new core also with explicite coreNodeName and the same
> shard and collection
>   Request url as following:
>
> http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test1&shard=Test&collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol&coreNodeName=Test1
>
>  It responses an error:
>
> 
> 
>400
>  29
>   
>   
>  Error CREATEing SolrCore 'Test1': Test1 is
> removed
>  400
>   
>   
>
>I find out that in the src class in  org.apache.solr.cloud.
> ZkController line 1369~ 1384:
>As the code says,when I indicate a  coreNodeName and collection
> explicitly,it goes to check a 'autoCreated' property of the Collection
> which I have already created.
>
>   My question :Why does it need to check the 'autoCreated' property,any
> jira about this 'autoCreated' property? How can I make  through the check?
>
>
> [1]-
> try {
>   if(cd.getCloudDescriptor().getCollectionName() !=null &&
>  cd.getCloudDescriptor().getCoreNodeName() != null ) {
> //we were already registered
>
>
>  
> if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){
> DocCollection coll =
>
>  
> zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName());
>  if(!"true".equals(coll.getStr("autoCreated"))){
>Slice slice =
>  coll.getSlice(cd.getCloudDescriptor().getShardId());
>if(slice != null){
>  if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName())
>  == null) {
>log.info("core_removed This core is removed from ZK");
>throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName +"
>  is removed");
>  }
>}
>  }
> }
>   }
>
>  
> --
>
>
> Regards
>
>
> 2013/12/27 Mark Miller 
>
>> If you are seeing an NPE there, sounds like you are on to something.
>> Please file a JIRA issue.
>>
>> - Mark
>>
>> > On Dec 26, 2013, at 1:29 AM, YouPeng Yang 
>> wrote:
>> >
>> > Hi
>> >   Merry Christmas.
>> >
>> >   Before this mail,I am in trouble with a weird problem  for a few days
>> > when to create a new core with both explicite shard and coreNodeName.
>> And I
>> > have posted a few mails  in the mailist,no one ever gives any
>> > suggestions,maybe  they did not  encounter the same problem.
>> >  I have to go through the srcs to check out the reason. Thanks god, I
>> find
>> > it. The reason to the problem,maybe be a bug, so I would like to report
>> it
>> > hoping to get your endorsement and confirmation.
>> >
>> >
>> > In class org.apache.solr.cloud.Overseer the Line 360:
>> > -
>> >  if (sliceName !=null && collectionExists &&
>> > !"true".equals(state.getCollection(collection).getStr("autoCreated"))) {
>> >Slice slice = state.getSlice(collection, sliceName);
>> >if (slice.getReplica(coreNodeName) == null) {
>> >  log.info("core_deleted . Just return");
>> >  return state;
>> >}
>> >  }
>> > -
>> > the slice needs to be checked null .because I create a new core with
>> both
>> > explicite shard and coreNodeName, the state.getSlice(collection,
>> > sliceName)  may return a null.So it needs to be checked ,or there will
>> be
>> > an NullpointException
>> > -
>> >  if (sliceName !=null && collectionExists &&
>> > !"true".equals(state.getCollection(collection).getStr("autoCreated"))) {
>> >Slice slice = state.getSlice(collection, sliceName);
>> >if (*slice != null &&* slice.getReplica(coreNodeName) ==
>> null) {
>> >  log.info("core_deleted . Just return");
>> >  return state;
>> >}
>> >  }
>> > -
>> >
>> > *Querstion 1*: Is this OK with the whole solr project,I have no aware
>> > about the influences about the change,as right now ,it goes right.
>> Please
>> > make confirm about this.
>> >
>> >  After I fixed this prolem,I can create a core with the request:
>> > http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test&;
>> >

43 matches

Mail list logo