How to use Solr in my project
Hello, First off, I apologize if this was sent twice. I was having issues subscribing to the list. I'm a complete noob in Solr (and indexing), so I'm hoping someone can help me figure out how to implement Solr in my project. I have gone through some tutorials online and I was able to import and query text in some Arabic PDF documents. We have some scans of Historical Handwritten Arabic documents that will have text extracted into a database (or PDF). We would like the user to be able to search the document for text, then have the scanned image show up in a viewer with the text highlighted. I would like to use Solr to index the text in the documents, but I'm unsure how to store and get the "word location" in Solr (area of text that needs to be highlighted). Do I index and store the full document in the Solr? How do l link the "search term" to the "word location" on the page? The only way I can figure out how to do this involves querying the database for the "word" and "location" after querying Solr for the search term, but is that defeating the purpose of using Solr? I would really appreciate help figuring this out. Thank you, Fatima
Re: How to use Solr in my project
On 26 December 2013 10:54, Fatima Issawi wrote: > Hello, > > First off, I apologize if this was sent twice. I was having issues > subscribing to the list. > > I'm a complete noob in Solr (and indexing), so I'm hoping someone can help me > figure out how to implement Solr in my project. I have gone through some > tutorials online and I was able to import and query text in some Arabic PDF > documents. > > We have some scans of Historical Handwritten Arabic documents that will have > text extracted into a database (or PDF). We would like the user to be able to > search the document for text, then have the scanned image show up in a viewer > with the text highlighted. This will not work for scanned images which do not actually contain the text. If you have the text of the documents, the best that you can do is break the text into pages corresponding to the scanned images, and index into Solr the text from the pages and the scanned image that should be linked to the text. For a user search, you will need to show the scanned image for the entire page: Highlighting of the search term in an image is not possible without optical character recognition (OCR). Similarly, if you are indexing from PDFs, you will need to ensure that they contain text, and not just images. Regards, Gora
Solr update document issue
Hello all, in our last project we use solr as search engine to search for assets. we have a functionality to search for product in it's summary text, the product itself is "container for a set of products parent" so each time we add new product under it , the summary of the "parent product" should be updated to add the new text.so in this case each time we add new child product, the parent product summary text should be updated,some times the added summary text list is empty sometimes not, but in case of empty list the document all field are delted except version and id. to avoid this problem we ignore the update behavior in case of empty list.*A. in case of update with empty list:* 1.added document is : 121112 hehe go go goool ollay hehedoc11455476967916699648 2. after update 1211121455476967916699659 *B. in case of not empty list in update request:* 1. same as in a.1. 2. 121112 hehe go go goool ollay hehe go go 12312312312312312 123123123 ollay 1232131231231231313doc11455476967916699648 i use solrj and solr4.4.0.my schema document : my java code to test this senario is as follow://TestingSolrUpdateDoc.javapackage org.solr.test;import java.io.IOException;import java.util.ArrayList;import java.util.HashMap;import java.util.List;import java.util.Map;import org.apache.solr.client.solrj.SolrServerException;import org.apache.solr.common.SolrInputDocument;public class TestingSolrUpdateDoc { public static void main(String[] args) {try { addDoc(121112, false); } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace();} catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace();} } public static void addDoc(long id, boolean emptyListUpdate)throws SolrServerException, IOException { SolrInputDocument solrInputDocument = new SolrInputDocument(); solrInputDocument.setField("id", new Long(id)); solrInputDocument.setField("text", generateRandomTextList()); solrInputDocument.setField("name", "doc1"); SolrConnection connection = SolrConnection.getConnection(); connection.addDocument(solrInputDocument); if (emptyListUpdate) { solrInputDocument = new SolrInputDocument(); solrInputDocument.setField("id", new Long(id)); Map update = new HashMap();update.put("add", new ArrayList()); solrInputDocument.addField("text", update); connection.updateDocument(solrInputDocument); } else { solrInputDocument = new SolrInputDocument(); solrInputDocument.setField("id", new Long(id)); Map update = new HashMap(); update.put("add", generateRandomUpdateTextList()); solrInputDocument.addField("text", update); connection.updateDocument(solrInputDocument); } } private static List generateRandomTextList() { List texts = new ArrayList(); texts.add("hehe"); texts.add("go go "); texts.add("goool"); texts.add("ollay"); return texts; } private static List generateRandomUpdateTextList() { List texts = new ArrayList();texts.add("hehe"); texts.add("go go "); texts.add("12312312312312312 123123123"); texts.add("ollay 1232131231231231313"); return texts; }}//SolrConnection .javapackage org.solr.test;import java.io.IOException;import org.apache.solr.client.solrj.SolrServer;import org.apache.solr.client.solrj.SolrServerException;import org.apache.solr.client.solrj.impl.HttpSolrServer;import org.apache.solr.common.SolrInputDocument;public class SolrConnection { private SolrServer solrServer = new HttpSolrServer("http://localhost:8983/solr/test";); private static SolrConnection solrConnection = new SolrConnection(); private SolrConnection() { } public static SolrConnection getConnection() { if (solrConnection != null) { return solrConnection; } synchronized (SolrConnection.class) {if (solrConnection != null) { return solrConnection; } solrConnection = new SolrConnection(); return solrConnection; } } public void addDocument(SolrInputDocument doc) throws SolrServerException,IOException { solrServer.add(doc); solrServer.commit();} public
Re: Solr update document issue
Hello all, in our last project we use solr as search engine to search for assets. we have a functionality to search for product in it's summary text, the product itself is "container for a set of products parent" so each time we add new product under it , the summary of the "parent product" should be updated to add the new text. so in this case each time we add new child product, the parent product summary text should be updated, some times the added summary text list is empty sometimes not, but in case of empty list the document all field are delted except version and id. to avoid this problem we ignore the update behavior in case of empty list. *A. in case of update with empty list:* 1.added document is : 2. after update *B. in case of not empty list in update request:* 1. same as in a.1. 2. i use solrj and solr4.4.0. my schema document : my java code to test this senario is as follow: //TestingSolrUpdateDoc.java //SolrConnection .java Best Thanks, Mohammad yaseen -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-update-document-issue-tp4108214p4108215.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: How to use Solr in my project
Hi, I should clarify. We have another application extracting the text from the document. The full text from each document will be stored in a database either at the document level or page level (this hasn't been decided yet). We will also be storing word location of each word on the page in the database. What I'm having problems with is deciding on the schema. We want a user to be able to search for a word in the database, have a list of documents that word is located in, and location in the document that word is located it. When he selects the search results, we want the scanned picture to have that word highlighted on the page. I want to index the document using Solr, but I'm having trouble figuring out how to design the schema to return that "word location" of a search term on the scanned picture in order to highlight it. Does this make more sense? Fatima -Original Message- From: Gora Mohanty [mailto:g...@mimirtech.com] Sent: Thursday, December 26, 2013 1:00 PM To: solr-user@lucene.apache.org Subject: Re: How to use Solr in my project On 26 December 2013 10:54, Fatima Issawi wrote: > Hello, > > First off, I apologize if this was sent twice. I was having issues > subscribing to the list. > > I'm a complete noob in Solr (and indexing), so I'm hoping someone can help me > figure out how to implement Solr in my project. I have gone through some > tutorials online and I was able to import and query text in some Arabic PDF > documents. > > We have some scans of Historical Handwritten Arabic documents that will have > text extracted into a database (or PDF). We would like the user to be able to > search the document for text, then have the scanned image show up in a viewer > with the text highlighted. This will not work for scanned images which do not actually contain the text. If you have the text of the documents, the best that you can do is break the text into pages corresponding to the scanned images, and index into Solr the text from the pages and the scanned image that should be linked to the text. For a user search, you will need to show the scanned image for the entire page: Highlighting of the search term in an image is not possible without optical character recognition (OCR). Similarly, if you are indexing from PDFs, you will need to ensure that they contain text, and not just images. Regards, Gora
Re: How to use Solr in my project
On 26 December 2013 15:44, Fatima Issawi wrote: > Hi, > > I should clarify. We have another application extracting the text from the > document. The full text from each document will be stored in a database > either at the document level or page level (this hasn't been decided yet). We > will also be storing word location of each word on the page in the database. What do you mean by "word location"? The number on the page? What purpose would this serve? > What I'm having problems with is deciding on the schema. We want a user to be > able to search for a word in the database, have a list of documents that word > is located in, and location in the document that word is located it. When he > selects the search results, we want the scanned picture to have that word > highlighted on the page. [...] I think that you might be confusing things: * If you have the full-text, you can highlight where the word was found. Solr highlighting handles this for you, and there is no need to store word location * You can have different images (presumably, individual scanned pages) linked to different sections of text, and show the entire image. Highlighting in the image is not possible, unless by "word location" you mean the (x, y) coordinates of the word on the page. Even then: - It will be prohibitively expensive to store the location of every word in every image for a large number of documents - Some image processing will be required to handle the highlighting after the scanned image is retrieved Regards, Gora
Solr Query Slowliness
Hi all, I have multiple python scripts querying solr with the sunburnt module. Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB memory & 840 GB storage) and contained several cores for different usage. When I manually executed a query through Solr Admin (a query containing 10~15 terms, with some of them having boosts over one field and limited to one result without any sorting or faceting etc ) it takes around 700 ms, and the Core contained 7 million documents. When the scripts are executed things get slower, my query takes 7~10s. Then what I did is to turn to SolrCloud expecting huge performance increase. I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU with 28 ECU, 15 GB memory & 160 SSD storage), then I created one collection to contain the core I was querying, I sharded it to 25 shards (each node containing 5 shards without replication), each shards took 54 MB of storage. Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich is very good ! Tested my scripts again (I have 30 scripts running at the same time), and as a surprise, things run fast for 5 seconds then it turns realy slow again (query time ). I updated the solrconfig.xml to remove the query caches (I don't need them since queries are very different and only 1 time queries) and changes the index memory to 1 GB, but only got a small increase (3~4s for each query ?!) Any ideas ? PS: My index size will not stay with 7m documents, it will grow to +100m and that may get things worse
Re: Solr Query Slowliness
Hello! Could you tell us more about your scripts? What they do? If the queries are the same? How many results you fetch with your scripts and so on. -- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ > Hi all, > I have multiple python scripts querying solr with the sunburnt module. > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB memory > & 840 GB storage) and contained several cores for different usage. > When I manually executed a query through Solr Admin (a query containing > 10~15 terms, with some of them having boosts over one field and limited to > one result without any sorting or faceting etc ) it takes around 700 > ms, and the Core contained 7 million documents. > When the scripts are executed things get slower, my query takes 7~10s. > Then what I did is to turn to SolrCloud expecting huge performance increase. > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one collection > to contain the core I was querying, I sharded it to 25 shards (each node > containing 5 shards without replication), each shards took 54 MB of storage. > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich > is very good ! > Tested my scripts again (I have 30 scripts running at the same time), and > as a surprise, things run fast for 5 seconds then it turns realy slow again > (query time ). > I updated the solrconfig.xml to remove the query caches (I don't need them > since queries are very different and only 1 time queries) and changes the > index memory to 1 GB, but only got a small increase (3~4s for each query ?!) > Any ideas ? > PS: My index size will not stay with 7m documents, it will grow to +100m > and that may get things worse
Re: Solr Query Slowliness
Thanks Rafal for your reply, My scripts are running on other independent machines so they does not affect Solr, I did mention that the queries are not the same (that is why I removed the query cache from solrconfig.xml), and I only get 1 result from Solr (which is the top scored one so no sorting since it is by default ordred by score) 2013/12/26 Rafał Kuć > Hello! > > Could you tell us more about your scripts? What they do? If the > queries are the same? How many results you fetch with your scripts and > so on. > > -- > Regards, > Rafał Kuć > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > > Hi all, > > > I have multiple python scripts querying solr with the sunburnt module. > > > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB > memory > > & 840 GB storage) and contained several cores for different usage. > > > When I manually executed a query through Solr Admin (a query containing > > 10~15 terms, with some of them having boosts over one field and limited > to > > one result without any sorting or faceting etc ) it takes around 700 > > ms, and the Core contained 7 million documents. > > > When the scripts are executed things get slower, my query takes 7~10s. > > > Then what I did is to turn to SolrCloud expecting huge performance > increase. > > > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU > > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one > collection > > to contain the core I was querying, I sharded it to 25 shards (each node > > containing 5 shards without replication), each shards took 54 MB of > storage. > > > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich > > is very good ! > > > Tested my scripts again (I have 30 scripts running at the same time), and > > as a surprise, things run fast for 5 seconds then it turns realy slow > again > > (query time ). > > > I updated the solrconfig.xml to remove the query caches (I don't need > them > > since queries are very different and only 1 time queries) and changes the > > index memory to 1 GB, but only got a small increase (3~4s for each query > ?!) > > > Any ideas ? > > > PS: My index size will not stay with 7m documents, it will grow to +100m > > and that may get things worse > >
Re: Solr Query Slowliness
Hello! Different queries can have different execution time, that's why I asked about the details. When running the scripts, is Solr CPU fully utilized? To tell more I would like to see what queries are run against Solr from scripts. Do you have any information on network throughput between the server you are running scripts on and the Solr cluster? You wrote that the scripts are fine for 5 seconds and than they get slow. If your Solr cluster is not fully utilized I would take a look at the queries and what they return (ie. using faceting with facet.limit=-1) and seeing if the network is able to process those. -- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ > Thanks Rafal for your reply, > My scripts are running on other independent machines so they does not > affect Solr, I did mention that the queries are not the same (that is why I > removed the query cache from solrconfig.xml), and I only get 1 result from > Solr (which is the top scored one so no sorting since it is by default > ordred by score) > 2013/12/26 Rafał Kuć >> Hello! >> >> Could you tell us more about your scripts? What they do? If the >> queries are the same? How many results you fetch with your scripts and >> so on. >> >> -- >> Regards, >> Rafał Kuć >> Performance Monitoring * Log Analytics * Search Analytics >> Solr & Elasticsearch Support * http://sematext.com/ >> >> >> > Hi all, >> >> > I have multiple python scripts querying solr with the sunburnt module. >> >> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB >> memory >> > & 840 GB storage) and contained several cores for different usage. >> >> > When I manually executed a query through Solr Admin (a query containing >> > 10~15 terms, with some of them having boosts over one field and limited >> to >> > one result without any sorting or faceting etc ) it takes around 700 >> > ms, and the Core contained 7 million documents. >> >> > When the scripts are executed things get slower, my query takes 7~10s. >> >> > Then what I did is to turn to SolrCloud expecting huge performance >> increase. >> >> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU >> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one >> collection >> > to contain the core I was querying, I sharded it to 25 shards (each node >> > containing 5 shards without replication), each shards took 54 MB of >> storage. >> >> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich >> > is very good ! >> >> > Tested my scripts again (I have 30 scripts running at the same time), and >> > as a surprise, things run fast for 5 seconds then it turns realy slow >> again >> > (query time ). >> >> > I updated the solrconfig.xml to remove the query caches (I don't need >> them >> > since queries are very different and only 1 time queries) and changes the >> > index memory to 1 GB, but only got a small increase (3~4s for each query >> ?!) >> >> > Any ideas ? >> >> > PS: My index size will not stay with 7m documents, it will grow to +100m >> > and that may get things worse >> >>
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query
I'm trying to setup Solr with a Wordpress database running on MySQL. But on trying a full import: `http://localhost:8983/solr/tv-wordpress/dataimport?command=full-import` The error is: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query **data-config.xml** I also tried including the database name in the SQL statement: SELECT * FROM wptalkman.wp_posts WHERE post_status='publish'; and change the connection url to `jdbc:mysql@localhost:3306` But I'm still unable to execute the query. **console output** 194278 [Thread-22] INFO org.apache.solr.update.UpdateHandler û start rollback{ } 194279 [Thread-22] INFO org.apache.solr.update.DefaultSolrCoreState û Creating new IndexWriter... 194279 [Thread-22] INFO org.apache.solr.update.DefaultSolrCoreState û Waiting until IndexWriter is unused... core=tv-wordpress 194280 [Thread-22] INFO org.apache.solr.update.DefaultSolrCoreState û Rollback old IndexWriter... core=tv-wordpress 194282 [Thread-22] INFO org.apache.solr.core.SolrCore û SolrDeletionPolicy.onI nit: commits:num=1 commit{dir=NRTCachingDirectory(org.apache.lucene.store.SimpleFSDirectory @C:\Dropbox\Databases\solr-4.3.1\example\example-DIH\solr\tv-wordpress\data\inde x lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@10ff234; maxCach eMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3l,generation=129,filenames=[_3o.nvd , _3o_Lucene41_0.tim, _3o.fnm, _3o.nvm, _3o_Lucene41_0.tip, _3o.fdt, _3o_Lucene4 1_0.pos, segments_3l, _3o.fdx, _3o_Lucene41_0.doc, _3o.si] 194283 [Thread-22] INFO org.apache.solr.core.SolrCore û newest commit = 129[_3 o.nvd, _3o_Lucene41_0.tim, _3o.fnm, _3o.nvm, _3o_Lucene41_0.tip, _3o.fdt, _3o_Lu cene41_0.pos, segments_3l, _3o.fdx, _3o_Lucene41_0.doc, _3o.si] 194283 [Thread-22] INFO org.apache.solr.update.DefaultSolrCoreState û New Inde xWriter is ready to be used. 194283 [Thread-22] INFO org.apache.solr.update.UpdateHandler û end_rollback 194669 [qtp32398134-13] INFO org.apache.solr.handler.dataimport.DataImporter û Loading DIH Configuration: wordpress-data-config.xml 194672 [qtp32398134-13] INFO org.apache.solr.handler.dataimport.DataImporter û Data Configuration loaded successfully 194676 [Thread-23] INFO org.apache.solr.handler.dataimport.DataImporter û Star ting Full Import 194676 [qtp32398134-13] INFO org.apache.solr.core.SolrCore û [tv-wordpress] we bapp=/solr path=/dataimport params={command=full-import} status=0 QTime=8 194680 [Thread-23] INFO org.apache.solr.handler.dataimport.SimplePropertiesWrit er û Read dataimport.properties 194681 [Thread-23] INFO org.apache.solr.core.SolrCore û [tv-wordpress] REMOVIN G ALL DOCUMENTS FROM INDEX 194686 [Thread-23] INFO org.apache.solr.handler.dataimport.JdbcDataSource û Cr eating a connection for entity article with URL: jdbc:mysql@localhost:3306/wptal kman 194686 [Thread-23] INFO org.apache.solr.handler.dataimport.JdbcDataSource û Ti me taken for getConnection(): 0 194687 [Thread-23] ERROR org.apache.solr.handler.dataimport.DocBuilder û Except ion while processing: article document : SolrInputDocument[]:org.apache.solr.han dler.dataimport.DataImportHandlerException: Unable to execute query: select * fr om wp_posts WHERE post_status='publish' Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd Throw(DataImportHandlerException.java:71) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.< init>(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou rce.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou rce.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEn tityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEnti tyProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent ityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde r.java:465) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde r.java:404) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j ava:319) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java :227) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(
Re: Solr Query Slowliness
This an example of a query: http://myip:8080/solr/TestCatMatch_shard12_replica1/select?q=Royal+Cashmere+RC+106+CS+Silk+Cashmere+V+Neck+Moss+Green+Men ^10+s+Sweater+Cashmere^3+Men^3+Sweaters^3+Clothing^3&rows=1&wt=json&indent=true in return : { "responseHeader":{ "status":0, "QTime":191}, "response":{"numFound":4539784,"start":0,"maxScore":2.0123534,"docs":[ { "Sections":"fashion", "IdsCategories":"11101911", "IdProduct":"ef6b8d7cf8340d0c8935727a07baebab", "Id":"11101911-ef6b8d7cf8340d0c8935727a07baebab", "Name":"Uniqlo Men Cashmere V Neck Sweater Men Clothing Sweaters Cashmere", "_version_":1455419757424541696}] }} This query was executed when no script is running so the QTime is only 191 ms, but it may take up to 3s when they are) Of course it can be smaller or bigger and of course that affects the execution time (the execution times I spoke of are the internal ones returned by solr, not calculated by me). And yes the CPU is fully used. 2013/12/26 Rafał Kuć > Hello! > > Different queries can have different execution time, that's why I > asked about the details. When running the scripts, is Solr CPU fully > utilized? To tell more I would like to see what queries are run > against Solr from scripts. > > Do you have any information on network throughput between the server > you are running scripts on and the Solr cluster? You wrote that the > scripts are fine for 5 seconds and than they get slow. If your Solr > cluster is not fully utilized I would take a look at the queries and > what they return (ie. using faceting with facet.limit=-1) and seeing > if the network is able to process those. > > -- > Regards, > Rafał Kuć > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > > Thanks Rafal for your reply, > > > My scripts are running on other independent machines so they does not > > affect Solr, I did mention that the queries are not the same (that is > why I > > removed the query cache from solrconfig.xml), and I only get 1 result > from > > Solr (which is the top scored one so no sorting since it is by default > > ordred by score) > > > > > 2013/12/26 Rafał Kuć > > >> Hello! > >> > >> Could you tell us more about your scripts? What they do? If the > >> queries are the same? How many results you fetch with your scripts and > >> so on. > >> > >> -- > >> Regards, > >> Rafał Kuć > >> Performance Monitoring * Log Analytics * Search Analytics > >> Solr & Elasticsearch Support * http://sematext.com/ > >> > >> > >> > Hi all, > >> > >> > I have multiple python scripts querying solr with the sunburnt module. > >> > >> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB > >> memory > >> > & 840 GB storage) and contained several cores for different usage. > >> > >> > When I manually executed a query through Solr Admin (a query > containing > >> > 10~15 terms, with some of them having boosts over one field and > limited > >> to > >> > one result without any sorting or faceting etc ) it takes around > 700 > >> > ms, and the Core contained 7 million documents. > >> > >> > When the scripts are executed things get slower, my query takes 7~10s. > >> > >> > Then what I did is to turn to SolrCloud expecting huge performance > >> increase. > >> > >> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 > vCPU > >> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one > >> collection > >> > to contain the core I was querying, I sharded it to 25 shards (each > node > >> > containing 5 shards without replication), each shards took 54 MB of > >> storage. > >> > >> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase > wich > >> > is very good ! > >> > >> > Tested my scripts again (I have 30 scripts running at the same time), > and > >> > as a surprise, things run fast for 5 seconds then it turns realy slow > >> again > >> > (query time ). > >> > >> > I updated the solrconfig.xml to remove the query caches (I don't need > >> them > >> > since queries are very different and only 1 time queries) and changes > the > >> > index memory to 1 GB, but only got a small increase (3~4s for each > query > >> ?!) > >> > >> > Any ideas ? > >> > >> > PS: My index size will not stay with 7m documents, it will grow to > +100m > >> > and that may get things worse > >> > >> > >
Re: Configurable collectors for custom ranking
In my case, the final function call looks something like this: sum(product($k1,score()),product($k2,field(x))) This means that all the scores would have to scaled and passed down, not just the top N because even a low score could be offset by a high value in 'field(x)'. Thanks, Peter On Mon, Dec 23, 2013 at 6:37 PM, Joel Bernstein wrote: > Peter, > > You actually only need the current score being collected to be in the > request context. So you don't need a map, you just need an object wrapper > around a mutable float. > > If you have a page size of X, only the top X scores need to be held onto, > because all the other scores wouldn't have made it into that page anyway so > they might as well be 0. Because the QueryResultCache caches's a larger > window then the page size you should keep enough scores so the cached > docList is correct. But if you're only dealing with 150K of results you > could just keep all the scores in a FloatArrayList and not worry about the > keeping the top X scores in a priority queue. > > During the collect hang onto the docIds and scores and build your scaling > info. > > During the finish iterate your docIds and scale the scores as you go. > > Set your scaled score into the object wrapper that is in the request > context before you collect each document. > > When you call collect on the delegate collectors they will call the custom > value source for each document to perform the sort. Your custom value > source will return whatever the float value is in the request context at > that time. > > If you're also going to run this postfilter when you're doing a standard > rank by score you'll also need to send down a dummy scorer to the delegate > collectors. Spend some time with the CollapsingQParserPlugin in trunk to > see how the dummy scorer works. > > I'll be adding value source collapse criteria to the > CollapsingQParserPlugin this week and it will have a similar interaction > between a PostFilter and value source. So you may want to watch SOLR-5536 > to see an example of this. > > Joel > > > > > > > > > > > > > Joel Bernstein > Search Engineer at Heliosearch > > > On Mon, Dec 23, 2013 at 4:03 PM, Peter Keegan >wrote: > > > Hi Joel, > > > > Could you clarify what would be in the key,value Map added to the > > SearchRequest context? It seems that all the docId/score tuples need to > be > > there, including the ones not in the 'top N ScoreDocs' PriorityQueue > > (score=0). If so would the Map be something like: > > "scaled_scores",Map ? > > > > Also, what is the reason for passing score=0 for documents that aren't in > > the PriorityQueue? Will these docs get filtered out before a normal sort > by > > score? > > > > Thanks, > > Peter > > > > > > On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein > > wrote: > > > > > The sorting is going to happen in the lower level collectors. You need > a > > > value source that returns the score of the document being collected. > > > > > > Here is how you can make this happen: > > > > > > 1) Create an object in your PostFilter that simply holds the current > > score. > > > Place this object in the SearchRequest context map. Update object.score > > as > > > you pass the docs and scores to the lower collectors. > > > > > > 2) Create a values source that checks the SearchRequest context for the > > > object that's holding the current score. Use this object to return the > > > current score when called. For example if you give the value source a > > > handle called "score" a compound function call will look like this: > > > sum(score(), field(x)) > > > > > > Joel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan > > >wrote: > > > > > > > Regarding my original goal, which is to perform a math function using > > the > > > > scaled score and a field value, and sort on the result, how does this > > fit > > > > in? Must I implement another custom PostFilter with a higher cost > than > > > the > > > > scale PostFilter? > > > > > > > > Thanks, > > > > Peter > > > > > > > > > > > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan < > peterlkee...@gmail.com > > > > >wrote: > > > > > > > > > Thanks very much for the guidance. I'd be happy to donate a working > > > > > solution. > > > > > > > > > > Peter > > > > > > > > > > > > > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein < > joels...@gmail.com > > > > >wrote: > > > > > > > > > >> SOLR-5020 has the commit info, it's mainly changes to > > > SolrIndexSearcher > > > > I > > > > >> believe. They might apply to 4.3. > > > > >> I think as long you have the finish method that's all you'll need. > > If > > > > you > > > > >> can get this working it would be excellent if you could donate > back > > > the > > > > >> Scale PostFilter. > > > > >> > > > > >> > > > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan < > > peterlkee...@gmail.com > > > > >> >wrote: > > > > >> > > > > >> > This is what I was looking for, but the DelegatingCollector > > '
Re: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query
Which version of Solr are you using? Is it possible that the query you ran returns 0 results? On Thu, Dec 26, 2013 at 5:44 PM, PeterKerk wrote: > I'm trying to setup Solr with a Wordpress database running on MySQL. > > But on trying a full import: > `http://localhost:8983/solr/tv-wordpress/dataimport?command=full-import` > > > The error is: > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to > execute query > > > **data-config.xml** > > > url="jdbc:mysql@localhost:3306/wptalkman" user="root" password="" /> > > > > /> > column="post_author" /> > > > > > > I also tried including the database name in the SQL statement: > > SELECT * FROM wptalkman.wp_posts WHERE post_status='publish'; > > and change the connection url to `jdbc:mysql@localhost:3306` > > But I'm still unable to execute the query. > > > **console output** > > 194278 [Thread-22] INFO org.apache.solr.update.UpdateHandler û start > rollback{ > } > 194279 [Thread-22] INFO org.apache.solr.update.DefaultSolrCoreState û > Creating > new IndexWriter... > 194279 [Thread-22] INFO org.apache.solr.update.DefaultSolrCoreState û > Waiting > until IndexWriter is unused... core=tv-wordpress > 194280 [Thread-22] INFO org.apache.solr.update.DefaultSolrCoreState û > Rollback > old IndexWriter... core=tv-wordpress > 194282 [Thread-22] INFO org.apache.solr.core.SolrCore û > SolrDeletionPolicy.onI > nit: commits:num=1 > > commit{dir=NRTCachingDirectory(org.apache.lucene.store.SimpleFSDirectory > > @C:\Dropbox\Databases\solr-4.3.1\example\example-DIH\solr\tv-wordpress\data\inde > x lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@10ff234; > maxCach > eMB=48.0 > maxMergeSizeMB=4.0),segFN=segments_3l,generation=129,filenames=[_3o.nvd > , _3o_Lucene41_0.tim, _3o.fnm, _3o.nvm, _3o_Lucene41_0.tip, _3o.fdt, > _3o_Lucene4 > 1_0.pos, segments_3l, _3o.fdx, _3o_Lucene41_0.doc, _3o.si] > 194283 [Thread-22] INFO org.apache.solr.core.SolrCore û newest commit > = 129[_3 > o.nvd, _3o_Lucene41_0.tim, _3o.fnm, _3o.nvm, _3o_Lucene41_0.tip, > _3o.fdt, _3o_Lu > cene41_0.pos, segments_3l, _3o.fdx, _3o_Lucene41_0.doc, _3o.si] > 194283 [Thread-22] INFO org.apache.solr.update.DefaultSolrCoreState û > New Inde > xWriter is ready to be used. > 194283 [Thread-22] INFO org.apache.solr.update.UpdateHandler û > end_rollback > 194669 [qtp32398134-13] INFO > org.apache.solr.handler.dataimport.DataImporter û > Loading DIH Configuration: wordpress-data-config.xml > 194672 [qtp32398134-13] INFO > org.apache.solr.handler.dataimport.DataImporter û > Data Configuration loaded successfully > 194676 [Thread-23] INFO org.apache.solr.handler.dataimport.DataImporter > û Star > ting Full Import > 194676 [qtp32398134-13] INFO org.apache.solr.core.SolrCore û > [tv-wordpress] we > bapp=/solr path=/dataimport params={command=full-import} status=0 > QTime=8 > 194680 [Thread-23] INFO > org.apache.solr.handler.dataimport.SimplePropertiesWrit > er û Read dataimport.properties > 194681 [Thread-23] INFO org.apache.solr.core.SolrCore û [tv-wordpress] > REMOVIN > G ALL DOCUMENTS FROM INDEX > 194686 [Thread-23] INFO > org.apache.solr.handler.dataimport.JdbcDataSource û Cr > eating a connection for entity article with URL: > jdbc:mysql@localhost:3306/wptal > kman > 194686 [Thread-23] INFO > org.apache.solr.handler.dataimport.JdbcDataSource û Ti > me taken for getConnection(): 0 > 194687 [Thread-23] ERROR org.apache.solr.handler.dataimport.DocBuilder > û Except > ion while processing: article document : > SolrInputDocument[]:org.apache.solr.han > dler.dataimport.DataImportHandlerException: Unable to execute query: > select * fr > om wp_posts WHERE post_status='publish' Processing Document # 1 > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd > Throw(DataImportHandlerException.java:71) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.< > init>(JdbcDataSource.java:253) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou > rce.java:210) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou > rce.java:38) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEn > tityProcessor.java:59) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEnti > tyProcessor.java:73) > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent > ityProcessorWrapper.java:243) > at > org.apache.solr.handler.dataimport.DocBuilder.buildD
Re: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query
Solr 4.3.1 When I run the statement in MySQL Workbench or console the statement executes successfully and returns 2 results. FYI: I placed the mysql-connector-java-5.1.27-bin.jar in the \lib folder. Also: it should not throw this error even when 0 results are returned right? -- View this message in context: http://lucene.472066.n3.nabble.com/org-apache-solr-handler-dataimport-DataImportHandlerException-Unable-to-execute-query-tp4108227p4108233.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query
I was reading the code and it looks like it could throw an NPE if java.sql.Statement#execute() returns false which can happen if there are no results (although most drivers return an empty resultset instead): if (stmt.execute(query)) { resultSet = stmt.getResultSet(); } LOG.trace("Time taken for sql :" + (System.currentTimeMillis() - start)); colNames = readFieldNames(resultSet.getMetaData()); Can you try using the debug mode and paste its response? On Thu, Dec 26, 2013 at 7:29 PM, PeterKerk wrote: > Solr 4.3.1 > > When I run the statement in MySQL Workbench or console the statement > executes successfully and returns 2 results. > > FYI: I placed the mysql-connector-java-5.1.27-bin.jar in the \lib folder. > > Also: it should not throw this error even when 0 results are returned right? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/org-apache-solr-handler-dataimport-DataImportHandlerException-Unable-to-execute-query-tp4108227p4108233.html > Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Chaining plugins
I would like to develope a search handler that is doing some logic and then just sends the query to the default search handler so the results will be generated there. It's like it is a transparent plugin and the data will only go through it. How can this be achieved . thanks ahead :) -- View this message in context: http://lucene.472066.n3.nabble.com/Chaining-plugins-tp4108239.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query
Shalin Shekhar Mangar wrote > Can you try using the debug mode and paste its response? Ok, thanks. How do I enabled and use the debug mode? -- View this message in context: http://lucene.472066.n3.nabble.com/org-apache-solr-handler-dataimport-DataImportHandlerException-Unable-to-execute-query-tp4108227p4108248.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Questions about integrateing SolrCloud with HDFS
YouPeng, While I'm unable to help you with the issue that you're seeing I did want to comment here and say that I have previously brought up the same goal that you're trying to accomplish on this mailing list but received no feedback or input. I think it makes sense that Solr should not try to make its index directories distinct and redundant per shard/core while running on HDFS as data redundancy and locality is handled at a different layer in the software stack. +1 to this topic because I'd love to see Solr handle replication/redundancy more smartly on HDFS Thanks, Greg On Dec 24, 2013, at 1:57 AM, YouPeng Yang wrote: > Hi users > > Solr supports for writing and reading its index and transaction log files > to the HDFS distributed filesystem. > **I am curious about that there are any other futher improvement about > the integration with HDFS.* > **For the solr native replication will make multiple copies of the > master node's index. Because of the native replication of HDFS,there is no > need to do that.It just to need that multiple cores in solrcloud share the > same index directory in HDFS?* > > > The above supposition is what I want to achive when we are integrating > SolrCloud with HDFS (Solr 4.6). > To make sure of our application high available,we still have to take > the solr replication with some tricks. > > Firstly ,noting that solr's index directory is made up of > *collectionName/coreNodeName/data/index * > > *collectionName/coreNodeName/data/tlog* > So to achive this,we want to create multi cores that use the same hdfs > index directory . > > I have tested this within solr 4.4 by expilcitly indicating the same > coreNodeName. > > For example: > Step1, a core was created with the name=core1 and shard=core_shard1 and > collection=clollection1 and coreNodeName=*core1* > Step2. create another core with the name=core2 and shard=core_shard1 and > collection=clollection1 and coreNodeName= > *core1* > * T*he two core share the same shard ,collection and coreNodeName.As a > result,the two core will get the same index data which is stored in the > hdfs directory : > hdfs://myhdfs/*clollection1*/*core1*/data/index > hdfs://myhdfs/*clollection1*/*core1*/data/tlog > > Unfortunately*, *as the solr 4.6 was released,we upgraded . the above > goal failed. We could not create a core with both expilcit shard and > coreNodeName. > Exceptions are as [1]. > * Can some give some help?* > > > Regards > [1]-- > 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController > ?.publishing core=hdfstest3 state=down > 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController > ?.numShards not found on descriptor - reading it from system property > 64893698 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController > ?.look for our core node name > > > > 64951227 [http-bio-8080-exec-17] INFO org.apache.solr.core.SolrCore > ?.[reportCore_201208] webapp=/solr path=/replication > params={slave=false&command=details&wt=javabin&qt=/replication&version=2} > status=0 QTime=107 > > > 65213770 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController > ?.waiting to find shard id in clusterstate for hdfstest3 > 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore > ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore > 'hdfstest3': Could not get shard id for core: hdfstest3 >at > org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535) >at > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152) >at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >at > org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662) >at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) >at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) >at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) >at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) >at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) >at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) >at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) >at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) >at > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947) >at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) >at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) >at > org.apache.coyote.http11.
Re: Chaining plugins
I have subclassed the query component to do so. Using params, you can get almost everything thinkable that is not too much documented. paul On 26 déc. 2013, at 15:59, elmerfudd wrote: > I would like to develope a search handler that is doing some logic and then > just sends the query to the default search handler so the results will be > generated there. > It's like it is a transparent plugin and the data will only go through it. > > How can this be achieved . > thanks ahead :) > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Chaining-plugins-tp4108239.html > Sent from the Solr - User mailing list archive at Nabble.com.
Does Solr fork child processes and result in zombies?
I have three CentOS machines running Solr 4.6.0 cloud without any replication. That is, numshards is 3 and there is only one Solr instance running on each of the boxes. Also, on the boxes I arm running ZooKeeper. This is a test environment and I would not normally run ZooKeeper on the same boxes. As I am inserting data into Solr the boxes get in a weird state. I will log in and enter my username and password and then nothing, it just sits there. I am connected through Putty. Never gets to a command prompt. I stop the data import and after a while I can log in. I do the following command on one of the boxes and I see this: ps -lf -C java F S UIDPID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 0 S root 4772 1 99 80 0 - 1926607 futex_ 12:13 pts/0 213852-21:10:31 java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /var/zookeeper/bin/../build/classes:/var/zookeeper/bin/../build/lib/*.jar:/var/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/var/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/var/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/var/zookeeper/bin/../lib/log4j-1.2.15.jar:/var/zookeeper/bin/../lib/jline-0.9.94.jar:/var/zookeeper/bin/../zookeeper-3.4.5.jar:/var/zookeeper/bin/../src/java/lib/*.jar:/var/zookeeper/bin/../conf: -Xms1G -Xmx4G -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /var/zookeeper/bin/../conf/zoo.cfg 0 S root 5009 1 99 80 0 - 46184325 futex_ 12:26 pts/0 219341-04:38:50 /usr/bin/java -Dbootstrap_confdir=./solr/mycore/conf -Xms6G -Xmx12G -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=3000 -Dcollection.configName=amtcacheconf -DzkHost=prdslslbgtmdb01:2181,prdslslbgtmdb03:2181,prdslslbgtmdb04:2181 -DnumShards=3 -jar start.jar 1 D root 7879 5009 99 80 0 - 46184325 sched_ 15:40 pts/0 208-11:14:20 /usr/bin/java -Dbootstrap_confdir=./solr/mycore/conf -Xms6G -Xmx12G -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=3000 -Dcollection.configName=amtcacheconf -DzkHost=prdslslbgtmdb01:2181,prdslslbgtmdb03:2181,prdslslbgtmdb04:2181 -DnumShards=3 -jar start.jar 1 D root 7949 5009 99 80 0 - 46184325 sched_ 15:44 pts/0 208-11:14:20 /usr/bin/java -Dbootstrap_confdir=./solr/mycore/conf -Xms6G -Xmx12G -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=3000 -Dcollection.configName=amtcacheconf -DzkHost=prdslslbgtmdb01:2181,prdslslbgtmdb03:2181,prdslslbgtmdb04:2181 -DnumShards=3 -jar start.jar How did I end up with two child processes of Solr running? Notice they are two PIDS, 7879 and 7949, that are children of 5009. The exact same command as well, with all of the parameters I used to launch Solr. I also notice the "F" state is "1" for those two processes, so I assume that means "forked but didn't exec". Also the WCHAN is sched_ on both of them. The "S" state is "D" which means uninterruptible sleep ( usually IO ). Where are these processes coming from? Do I have something configured incorrectly?
Re: Maybe a bug for solr 4.6 when create a new core
On 12/25/2013 11:29 PM, YouPeng Yang wrote: > After I fixed this prolem,I can create a core with the request: > http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test&; > *shard=Test* > &collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol& > *coreNodeName=Test* In simple terms, what are you trying to get done? This is sounding like an XY problem. http://people.apache.org/~hossman/#xyproblem If you take a few steps back and describe what you want to happen, what has come before, what you've tried, and what has actually happened, it will be easier to help you. Most of the people who work on Solr are part of the western world, and yesterday was Christmas. Some people are getting back to work today, but many of them will be unavailable until after the new year. Thanks, Shawn
Re: Maybe a bug for solr 4.6 when create a new core
If you are seeing an NPE there, sounds like you are on to something. Please file a JIRA issue. - Mark > On Dec 26, 2013, at 1:29 AM, YouPeng Yang wrote: > > Hi > Merry Christmas. > > Before this mail,I am in trouble with a weird problem for a few days > when to create a new core with both explicite shard and coreNodeName. And I > have posted a few mails in the mailist,no one ever gives any > suggestions,maybe they did not encounter the same problem. > I have to go through the srcs to check out the reason. Thanks god, I find > it. The reason to the problem,maybe be a bug, so I would like to report it > hoping to get your endorsement and confirmation. > > > In class org.apache.solr.cloud.Overseer the Line 360: > - > if (sliceName !=null && collectionExists && > !"true".equals(state.getCollection(collection).getStr("autoCreated"))) { >Slice slice = state.getSlice(collection, sliceName); >if (slice.getReplica(coreNodeName) == null) { > log.info("core_deleted . Just return"); > return state; >} > } > - > the slice needs to be checked null .because I create a new core with both > explicite shard and coreNodeName, the state.getSlice(collection, > sliceName) may return a null.So it needs to be checked ,or there will be > an NullpointException > - > if (sliceName !=null && collectionExists && > !"true".equals(state.getCollection(collection).getStr("autoCreated"))) { >Slice slice = state.getSlice(collection, sliceName); >if (*slice != null &&* slice.getReplica(coreNodeName) == null) { > log.info("core_deleted . Just return"); > return state; >} > } > - > > *Querstion 1*: Is this OK with the whole solr project,I have no aware > about the influences about the change,as right now ,it goes right. Please > make confirm about this. > > After I fixed this prolem,I can create a core with the request: > http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test&; > *shard=Test* > &collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol& > *coreNodeName=Test* > > However when I create a replica within the same shard Test: > http://10.7.23.122:8080/solr/admin/cores?action=CREATE&*name=Test1*&; > *shard=Test* > &collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol& > *coreNodeName=Test1* > > It response an error: > > >400 > 29 > > > Error CREATEing SolrCore 'Test1': Test1 is > removed > 400 > > > > I aslo find the reason the in the class org.apache.solr.cloud.ZkController > line 1369~ 1384[1] > As the src here,it needs to check the autoCreated within an existing > collection > when the coreNodeName and shard were assigned manully. the autoCreated > property of a collection is not equal with true, it throws an exeption. > > *Question2*: Why does it need to check the 'autoCreated', and how could > I go through this check, or Is this another bug? > > > > > [1]- >try { > if(cd.getCloudDescriptor().getCollectionName() !=null && > cd.getCloudDescriptor().getCoreNodeName() != null ) { >//we were already registered > > if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){ >DocCollection coll = > zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName()); > if(!"true".equals(coll.getStr("autoCreated"))){ > Slice slice = > coll.getSlice(cd.getCloudDescriptor().getShardId()); > if(slice != null){ > if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName()) > == null) { > log.info("core_removed This core is removed from ZK"); > throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName +" > is removed"); > } > } > } >} > } > --
Re: Questions about integrateing SolrCloud with HDFS
Can you file a JIRA issue? - Mark > On Dec 24, 2013, at 2:57 AM, YouPeng Yang wrote: > > Hi users > > Solr supports for writing and reading its index and transaction log files > to the HDFS distributed filesystem. > **I am curious about that there are any other futher improvement about > the integration with HDFS.* > **For the solr native replication will make multiple copies of the > master node's index. Because of the native replication of HDFS,there is no > need to do that.It just to need that multiple cores in solrcloud share the > same index directory in HDFS?* > > > The above supposition is what I want to achive when we are integrating > SolrCloud with HDFS (Solr 4.6). > To make sure of our application high available,we still have to take > the solr replication with some tricks. > > Firstly ,noting that solr's index directory is made up of > *collectionName/coreNodeName/data/index * > > *collectionName/coreNodeName/data/tlog* > So to achive this,we want to create multi cores that use the same hdfs > index directory . > > I have tested this within solr 4.4 by expilcitly indicating the same > coreNodeName. > > For example: > Step1, a core was created with the name=core1 and shard=core_shard1 and > collection=clollection1 and coreNodeName=*core1* > Step2. create another core with the name=core2 and shard=core_shard1 and > collection=clollection1 and coreNodeName= > *core1* > * T*he two core share the same shard ,collection and coreNodeName.As a > result,the two core will get the same index data which is stored in the > hdfs directory : > hdfs://myhdfs/*clollection1*/*core1*/data/index > hdfs://myhdfs/*clollection1*/*core1*/data/tlog > > Unfortunately*, *as the solr 4.6 was released,we upgraded . the above > goal failed. We could not create a core with both expilcit shard and > coreNodeName. > Exceptions are as [1]. > * Can some give some help?* > > > Regards > [1]-- > 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController > ?.publishing core=hdfstest3 state=down > 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController > ?.numShards not found on descriptor - reading it from system property > 64893698 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController > ?.look for our core node name > > > > 64951227 [http-bio-8080-exec-17] INFO org.apache.solr.core.SolrCore > ?.[reportCore_201208] webapp=/solr path=/replication > params={slave=false&command=details&wt=javabin&qt=/replication&version=2} > status=0 QTime=107 > > > 65213770 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController > ?.waiting to find shard id in clusterstate for hdfstest3 > 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore > ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore > 'hdfstest3': Could not get shard id for core: hdfstest3 >at > org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535) >at > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152) >at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >at > org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662) >at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) >at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) >at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) >at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) >at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) >at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) >at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) >at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) >at > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947) >at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) >at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) >at > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009) >at > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) >at > org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >at java.lang.Thread.run(Thread.java:722) > Caused by: org.apache.solr.common.SolrException: Could not
Re: Questions about integrateing SolrCloud with HDFS
Cloudera has plans here. I'll be working on further hdfs / Solrcloud options in the near future. - Mark > On Dec 26, 2013, at 11:33 AM, Greg Walters wrote: > > YouPeng, > > While I'm unable to help you with the issue that you're seeing I did want to > comment here and say that I have previously brought up the same goal that > you're trying to accomplish on this mailing list but received no feedback or > input. I think it makes sense that Solr should not try to make its index > directories distinct and redundant per shard/core while running on HDFS as > data redundancy and locality is handled at a different layer in the software > stack. > > +1 to this topic because I'd love to see Solr handle replication/redundancy > more smartly on HDFS > > Thanks, > Greg > > >> On Dec 24, 2013, at 1:57 AM, YouPeng Yang wrote: >> >> Hi users >> >> Solr supports for writing and reading its index and transaction log files >> to the HDFS distributed filesystem. >> **I am curious about that there are any other futher improvement about >> the integration with HDFS.* >> **For the solr native replication will make multiple copies of the >> master node's index. Because of the native replication of HDFS,there is no >> need to do that.It just to need that multiple cores in solrcloud share the >> same index directory in HDFS?* >> >> >> The above supposition is what I want to achive when we are integrating >> SolrCloud with HDFS (Solr 4.6). >> To make sure of our application high available,we still have to take >> the solr replication with some tricks. >> >> Firstly ,noting that solr's index directory is made up of >> *collectionName/coreNodeName/data/index * >> >> *collectionName/coreNodeName/data/tlog* >> So to achive this,we want to create multi cores that use the same hdfs >> index directory . >> >> I have tested this within solr 4.4 by expilcitly indicating the same >> coreNodeName. >> >> For example: >> Step1, a core was created with the name=core1 and shard=core_shard1 and >> collection=clollection1 and coreNodeName=*core1* >> Step2. create another core with the name=core2 and shard=core_shard1 and >> collection=clollection1 and coreNodeName= >> *core1* >> * T*he two core share the same shard ,collection and coreNodeName.As a >> result,the two core will get the same index data which is stored in the >> hdfs directory : >> hdfs://myhdfs/*clollection1*/*core1*/data/index >> hdfs://myhdfs/*clollection1*/*core1*/data/tlog >> >> Unfortunately*, *as the solr 4.6 was released,we upgraded . the above >> goal failed. We could not create a core with both expilcit shard and >> coreNodeName. >> Exceptions are as [1]. >> * Can some give some help?* >> >> >> Regards >> [1]-- >> 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController >> ?.publishing core=hdfstest3 state=down >> 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController >> ?.numShards not found on descriptor - reading it from system property >> 64893698 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController >> ?.look for our core node name >> >> >> >> 64951227 [http-bio-8080-exec-17] INFO org.apache.solr.core.SolrCore >> ?.[reportCore_201208] webapp=/solr path=/replication >> params={slave=false&command=details&wt=javabin&qt=/replication&version=2} >> status=0 QTime=107 >> >> >> 65213770 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController >> ?.waiting to find shard id in clusterstate for hdfstest3 >> 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore >> ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore >> 'hdfstest3': Could not get shard id for core: hdfstest3 >> at >> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535) >> at >> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> at >> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) >> at >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) >> at >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) >> at >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) >> at >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) >> at >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) >> at >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) >> at >> org.apache.cat
Re: Bad fieldNorm when using morphologic synonyms
Attached patch into the JIRA issue. Reviews are welcome. On Thu, Dec 19, 2013 at 7:24 PM, Isaac Hebsh wrote: > Roman, do you have any results? > > created SOLR-5561 > > Robert, if I'm wrong, you are welcome to close that issue. > > > On Mon, Dec 9, 2013 at 10:50 PM, Isaac Hebsh wrote: > >> You can see the norm value, in the "explain" text, when setting >> debugQuery=true. >> If the same item gets different norm before/after, that's it. >> >> Note that this configuration is in schema.xml (not solrconfig.xml...) >> >> On Monday, December 9, 2013, Roman Chyla wrote: >> >>> Isaac, is there an easy way to recognize this problem? We also index >>> synonym tokens in the same position (like you do, and I'm sure that our >>> positions are set correctly). I could test whether the default similarity >>> factory in solrconfig.xml had any effect (before/after reindexing). >>> >>> --roman >>> >>> >>> On Mon, Dec 9, 2013 at 2:42 PM, Isaac Hebsh >>> wrote: >>> >>> > Hi Robert and Manuel. >>> > >>> > The DefaultSimilarity indeed sets discountOverlap to true by default. >>> > BUT, the *factory*, aka DefaultSimilarityFactory, when called by >>> > IndexSchema (the getSimilarity method), explicitly sets this value to >>> the >>> > value of its corresponding class member. >>> > This class member is initialized to be FALSE when the instance is >>> created >>> > (like every boolean variable in the world). It should be set when >>> "init" >>> > method is called. If the parameter is not set in schema.xml, the >>> default is >>> > true. >>> > >>> > Everything seems to be alright, but the issue is that "init" method is >>> NOT >>> > called, if the similarity is not *explicitly* declared in schema.xml. >>> In >>> > that case, init method is not called, the discountOverlaps member (of >>> the >>> > factory class) remains FALSE, and getSimilarity explicitly calls >>> > setDiscountOverlaps with value of FALSE. >>> > >>> > This is very easy to reproduce and debug. >>> > >>> > >>> > On Mon, Dec 9, 2013 at 9:19 PM, Robert Muir wrote: >>> > >>> > > no, its turned on by default in the default similarity. >>> > > >>> > > as i said, all that is necessary is to fix your analyzer to emit the >>> > > proper position increments. >>> > > >>> > > On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand >>> > > wrote: >>> > > > In order to set discountOverlaps to true you must have added the >>> > > > to the >>> schema.xml, >>> > > which >>> > > > is commented out by default! >>> > > > >>> > > > As by default this param is false, the above situation is expected >>> with >>> > > > correct positioning, as said. >>> > > > >>> > > > In order to fix the field norms you'd have to reindex with the >>> > similarity >>> > > > class which initializes the param to true. >>> > > > >>> > > > Cheers, >>> > > > Manu >>> > > >>> > >>> >> >
Re: Questions about integrateing SolrCloud with HDFS
Mark, I'd be happy to but some clarification first; should this issue be about creating cores with overlapping names and the stack trace that YouPeng initially described, Solr's behavior when storing data on HDFS or YouPeng's other thread (Maybe a bug for solr 4.6 when create a new core) that looks like it might be a near duplicate of this one? Thanks, Greg On Dec 26, 2013, at 12:40 PM, Mark Miller wrote: > Can you file a JIRA issue? > > - Mark > >> On Dec 24, 2013, at 2:57 AM, YouPeng Yang wrote: >> >> Hi users >> >> Solr supports for writing and reading its index and transaction log files >> to the HDFS distributed filesystem. >> **I am curious about that there are any other futher improvement about >> the integration with HDFS.* >> **For the solr native replication will make multiple copies of the >> master node's index. Because of the native replication of HDFS,there is no >> need to do that.It just to need that multiple cores in solrcloud share the >> same index directory in HDFS?* >> >> >> The above supposition is what I want to achive when we are integrating >> SolrCloud with HDFS (Solr 4.6). >> To make sure of our application high available,we still have to take >> the solr replication with some tricks. >> >> Firstly ,noting that solr's index directory is made up of >> *collectionName/coreNodeName/data/index * >> >> *collectionName/coreNodeName/data/tlog* >> So to achive this,we want to create multi cores that use the same hdfs >> index directory . >> >> I have tested this within solr 4.4 by expilcitly indicating the same >> coreNodeName. >> >> For example: >> Step1, a core was created with the name=core1 and shard=core_shard1 and >> collection=clollection1 and coreNodeName=*core1* >> Step2. create another core with the name=core2 and shard=core_shard1 and >> collection=clollection1 and coreNodeName= >> *core1* >> * T*he two core share the same shard ,collection and coreNodeName.As a >> result,the two core will get the same index data which is stored in the >> hdfs directory : >> hdfs://myhdfs/*clollection1*/*core1*/data/index >> hdfs://myhdfs/*clollection1*/*core1*/data/tlog >> >> Unfortunately*, *as the solr 4.6 was released,we upgraded . the above >> goal failed. We could not create a core with both expilcit shard and >> coreNodeName. >> Exceptions are as [1]. >> * Can some give some help?* >> >> >> Regards >> [1]-- >> 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController >> ?.publishing core=hdfstest3 state=down >> 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController >> ?.numShards not found on descriptor - reading it from system property >> 64893698 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController >> ?.look for our core node name >> >> >> >> 64951227 [http-bio-8080-exec-17] INFO org.apache.solr.core.SolrCore >> ?.[reportCore_201208] webapp=/solr path=/replication >> params={slave=false&command=details&wt=javabin&qt=/replication&version=2} >> status=0 QTime=107 >> >> >> 65213770 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController >> ?.waiting to find shard id in clusterstate for hdfstest3 >> 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore >> ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore >> 'hdfstest3': Could not get shard id for core: hdfstest3 >> at >> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535) >> at >> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> at >> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) >> at >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) >> at >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) >> at >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) >> at >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) >> at >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) >> at >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) >> at >> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947) >> at >> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) >> at >> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) >> at >> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Proce
Re: Does Solr fork child processes and result in zombies?
On 12/26/2013 9:56 AM, Sir Gilligan wrote: > I have three CentOS machines running Solr 4.6.0 cloud without any > replication. That is, numshards is 3 and there is only one Solr instance > running on each of the boxes. > > Also, on the boxes I arm running ZooKeeper. This is a test environment > and I would not normally run ZooKeeper on the same boxes. > > As I am inserting data into Solr the boxes get in a weird state. I will > log in and enter my username and password and then nothing, it just sits > there. I am connected through Putty. Never gets to a command prompt. I > stop the data import and after a while I can log in. > > I do the following command on one of the boxes and I see this: > > ps -lf -C java > How did I end up with two child processes of Solr running? Notice they > are two PIDS, 7879 and 7949, that are children of 5009. The exact same > command as well, with all of the parameters I used to launch Solr. > > I also notice the "F" state is "1" for those two processes, so I assume > that means "forked but didn't exec". > > Also the WCHAN is sched_ on both of them. > > The "S" state is "D" which means uninterruptible sleep ( usually IO ). > > Where are these processes coming from? Do I have something configured > incorrectly? Solr itself should not fork processes, or at least I have never seen it do so. It does appear that you are using 'start.jar' which suggests that you're using the Jetty that comes bundled with Solr, although I cannot tell that for sure. If you are using some other container (including another version/copy of Jetty), then I have no idea what it might do. I ran the same ps command on one of my CentOS 6 SolrCloud (4.2.1) machines and I get exactly two entries - one for zookeeper and one for Solr (running the included Jetty). If on the other hand I run a ps command that shows threads, I see a LOT of entries for both zookeeper and java, because these are highly threaded applications. I have a much larger Solr install that's not using SolrCloud, and I have never seen it fork processes either. My dev install (running 4.6.0 in non-cloud mode) also doesn't fork processes. Side notes: As long as the machine has enough resources available, running zookeeper on the same boxes as Solr shouldn't pose a problem. If the machine becomes heavily I/O bound and zookeeper data is not on separate spindles, it might be a problem. The bootstrap options are not meant to run on every startup. They should not be used except when first converting a non-cloud install to a cloud install. If you want to upload a new configuration to zookeeper, you can use the zkCli script in cloud-scripts and then reload your collection. Also, I think it's generally not a good idea to use the numShards startup parameter. You can indicate the number of shards for a collection when you create the collection. With a 12GB heap, you're definitely going to want to tune your garbage collection. I don't see an tuning parameters on your commandline. I'd like to avoid a religious garbage collection flame-war, so I will give you the settings that work for me and allow you to decide for yourself what to do: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Here's some more generic information about performance problems with Solr: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Re: adding a node to SolrCloud
On 12/23/2013 05:43 PM, Greg Preston wrote: I believe you can just define multiple cores: ... (this is the old style solr.xml. I don't know how to do it in the newer style) Yes, that is exactly what I did but somehow, the link between shards and collections gets lost and everything gets very confused. I guess I should have read more carefully about the valid parameters on the element. My problem was a missing attribute: @collection="collection-name" So the complete core definition that survives tomcat restarts: David
Re: Questions about integrateing SolrCloud with HDFS
1. The exception and change in experience on the move to 4.6 seems like it could be a bug we want to investigate. 2. Solr storing data on hdfs in other ways seems like a different issue / improvement. 3. You shouldn't try and force more than one core to use the same index on hdfs. This would be bad. 4. You really want to use the solr.hdfs.home setting described in the documentation IMO. - Mark > On Dec 26, 2013, at 1:56 PM, Greg Walters wrote: > > Mark, > > I'd be happy to but some clarification first; should this issue be about > creating cores with overlapping names and the stack trace that YouPeng > initially described, Solr's behavior when storing data on HDFS or YouPeng's > other thread (Maybe a bug for solr 4.6 when create a new core) that looks > like it might be a near duplicate of this one? > > Thanks, > Greg > >> On Dec 26, 2013, at 12:40 PM, Mark Miller wrote: >> >> Can you file a JIRA issue? >> >> - Mark >> >>> On Dec 24, 2013, at 2:57 AM, YouPeng Yang wrote: >>> >>> Hi users >>> >>> Solr supports for writing and reading its index and transaction log files >>> to the HDFS distributed filesystem. >>> **I am curious about that there are any other futher improvement about >>> the integration with HDFS.* >>> **For the solr native replication will make multiple copies of the >>> master node's index. Because of the native replication of HDFS,there is no >>> need to do that.It just to need that multiple cores in solrcloud share the >>> same index directory in HDFS?* >>> >>> >>> The above supposition is what I want to achive when we are integrating >>> SolrCloud with HDFS (Solr 4.6). >>> To make sure of our application high available,we still have to take >>> the solr replication with some tricks. >>> >>> Firstly ,noting that solr's index directory is made up of >>> *collectionName/coreNodeName/data/index * >>> >>> *collectionName/coreNodeName/data/tlog* >>> So to achive this,we want to create multi cores that use the same hdfs >>> index directory . >>> >>> I have tested this within solr 4.4 by expilcitly indicating the same >>> coreNodeName. >>> >>> For example: >>> Step1, a core was created with the name=core1 and shard=core_shard1 and >>> collection=clollection1 and coreNodeName=*core1* >>> Step2. create another core with the name=core2 and shard=core_shard1 and >>> collection=clollection1 and coreNodeName= >>> *core1* >>> * T*he two core share the same shard ,collection and coreNodeName.As a >>> result,the two core will get the same index data which is stored in the >>> hdfs directory : >>> hdfs://myhdfs/*clollection1*/*core1*/data/index >>> hdfs://myhdfs/*clollection1*/*core1*/data/tlog >>> >>> Unfortunately*, *as the solr 4.6 was released,we upgraded . the above >>> goal failed. We could not create a core with both expilcit shard and >>> coreNodeName. >>> Exceptions are as [1]. >>> * Can some give some help?* >>> >>> >>> Regards >>> [1]-- >>> 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController >>> ?.publishing core=hdfstest3 state=down >>> 64893635 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController >>> ?.numShards not found on descriptor - reading it from system property >>> 64893698 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController >>> ?.look for our core node name >>> >>> >>> >>> 64951227 [http-bio-8080-exec-17] INFO org.apache.solr.core.SolrCore >>> ?.[reportCore_201208] webapp=/solr path=/replication >>> params={slave=false&command=details&wt=javabin&qt=/replication&version=2} >>> status=0 QTime=107 >>> >>> >>> 65213770 [http-bio-8080-exec-1] INFO org.apache.solr.cloud.ZkController >>> ?.waiting to find shard id in clusterstate for hdfstest3 >>> 65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore >>> ?.org.apache.solr.common.SolrException: Error CREATEing SolrCore >>> 'hdfstest3': Could not get shard id for core: hdfstest3 >>> at >>> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535) >>> at >>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152) >>> at >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) >>> at >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) >>> at >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) >>> at >>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) >>> at >>> org.apache.catalina.core.StandardContext
Re: adding a node to SolrCloud
On 12/24/2013 8:35 AM, David Santamauro wrote: >> You may have one or more of the SolrCloud 'bootstrap' options on the >> startup commandline. The bootstrap options are intended to be used >> once, in order to bootstrap from a non-SolrCloud setup to a SolrCloud >> setup. > > No, no unnecessary options. I manually bootstrapped a common config. I have no idea what might be wrong here. >> Between the Collections API and the CoreAdmin API, you should never need >> to edit solr.xml (if using the pre-4.4 format) or core.properties files >> (if using core discovery, available 4.4 and later) directly. > > Now this I don't understand. If I have created cores through the > CoreAdmin API, how is solr.xml affected? If I don't edit it, how does > SOLR know what cores it has to expose to a distributed collection? If you are using the old-style solr.xml (which will be supported through all future 4.x versions, but not 5.0), then core definitions are stored in solr.xml and the contents of the file are changed by many of the CoreAdmin API actions. The Collections API calls the CoreAdmin API on servers throughout the cloud. http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29 If you are using the core discovery format, which was made available in working form in version 4.4, then solr.xml does NOT contain core definitions. The main example in 4.4 and later uses the new format. Cores are discovered at Solr startup by crawling the filesystem from a root starting point looking for core.properties files. In this mode, solr.xml is fairly static. http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond http://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29 Thanks, Shawn
Re: Boosting results on value of field different from query
Hi, Puneet I think you can try of provided advice from there : http://wiki.apache.org/solr/SolrRelevancyFAQ Like this one : http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_increase_the_score_for_specific_documents : set index time boos "per document", so set big boos for documents with type:compact and type:sedan Or this one http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_change_the_score_of_a_document_based_on_the_.2Avalue.2A_of_a_field_.28say.2C_.22popularity.22.29 and http://wiki.apache.org/solr/ExtendedDisMax#bf_.28Boost_Function.2C_additive.29 : use query time function for boosting, you can implement your own function query, called for example "typeBoosting", which will convert "type" value per document from string into boost number and use it like "typeBoosting(type)". 26.12.2013, 06:28, "Puneet Pawaia" : > Hi Manju > Would this query not be searching for and thus restricting results to type > sedan and compact? > I would like the results to include other types but only show up lower down > the list. > Regards > Puneet > On 26 Dec 2013 07:15, "manju16832003" wrote: > >> Hi Puneet, >> if you type field is pre-determined text field ex type [compact, sedan, >> hatchback], I think you have to boost with query type field (q) to >> get more accurate boosting. >> >> Ex: http://localhost:8983/solr/my/select?q=type:sedan^100 type:compact^10 >> >> >> (:*)^1&wt=json&indent=true&fl=,score&debug=results&bf=recip(rord(publish_date),1,2,3)^1.5&sort=score >> desc >> >> For publish_date, replace with the date you use for getting latest >> resultes. >> >> In the above query, things to note is that >> - fl=,score -> The result set would display score value for each document >> - sort by score as first sort field that will give you the documents with >> the highest boost value (score) on top >> >> Play around with the boosting values ^100 ^10 (perhaps 5,10,20 ) and >> observe how the score value will change the documents. >> >> I'm not really sure how solr calculation works, however the above query >> must give you the accurate boosted documents. >> >> -- >> View this message in context: >> >> http://lucene.472066.n3.nabble.com/Boosting-results-on-value-of-field-different-from-query-tp4108180p4108190.html >> Sent from the Solr - User mailing list archive at Nabble.com.
Re: adding a node to SolrCloud
On 12/26/2013 02:29 PM, Shawn Heisey wrote: On 12/24/2013 8:35 AM, David Santamauro wrote: You may have one or more of the SolrCloud 'bootstrap' options on the startup commandline. The bootstrap options are intended to be used once, in order to bootstrap from a non-SolrCloud setup to a SolrCloud setup. No, no unnecessary options. I manually bootstrapped a common config. I have no idea what might be wrong here. Between the Collections API and the CoreAdmin API, you should never need to edit solr.xml (if using the pre-4.4 format) or core.properties files (if using core discovery, available 4.4 and later) directly. Now this I don't understand. If I have created cores through the CoreAdmin API, how is solr.xml affected? If I don't edit it, how does SOLR know what cores it has to expose to a distributed collection? If you are using the old-style solr.xml (which will be supported through all future 4.x versions, but not 5.0), then core definitions are stored in solr.xml and the contents of the file are changed by many of the CoreAdmin API actions. The Collections API calls the CoreAdmin API on servers throughout the cloud. I have never experienced tomcat or the SOLR webapp create, modify or otherwise touch in anyway the solr.xml file. I have always had to add the necessary core definition manually. http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29 If you are using the core discovery format, which was made available in working form in version 4.4, then solr.xml does NOT contain core definitions. The main example in 4.4 and later uses the new format. Cores are discovered at Solr startup by crawling the filesystem from a root starting point looking for core.properties files. In this mode, solr.xml is fairly static. http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond http://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29 I'll begin exploring this new format, thanks for the help and links. David
Excluding terms in grouping results
Hello there, The question is: how to group results by some field (text terms), but exclude some term from being grouped by. For example there are few documents with field 'tags': 1. tags: term1 term2 term3 2. tags: term2 term3 term4 3. tags: term1 term2 term4 3. tags: term2 term3 term4 And i want to group by 'tags', so there usually result would be 4 groups, but, for example i need to exclude 'term4' as group, but still be able to see documents where 'term4' is present. Is there a way to do this? I can't use another field, cause there is a lot of these terms, and any term can be excluded, or even two at once. Another example, may be it help, when i make request to find ?q=tags:term4, i need group by tags, but exclude term4 to be a group, as i already searching by this term. Thank you for your time. -- View this message in context: http://lucene.472066.n3.nabble.com/Excluding-terms-in-grouping-results-tp4108280.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Chaining plugins
If I get elmer fudd's question correct, he needs something like creating his own component which will extends SearchComponent and do some logic in prepare method - change input request params probably. Then register this component in solrconfig and set it's for default search handler just before query component like: newComponentName query facet mlt highlight debug Lucene's search in query component will be executed with modified parameters. 26.12.2013, 20:55, "Paul Libbrecht" : > I have subclassed the query component to do so. > Using params, you can get almost everything thinkable that is not too much > documented. > > paul > > On 26 déc. 2013, at 15:59, elmerfudd wrote: > >> I would like to develope a search handler that is doing some logic and then >> just sends the query to the default search handler so the results will be >> generated there. >> It's like it is a transparent plugin and the data will only go through it. >> >> How can this be achieved . >> thanks ahead :) >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Chaining-plugins-tp4108239.html >> Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Query Slowliness
Hello! It seems that the number of queries per second generated by your scripts may be too much for your Solr cluster to handle with the latency you want. Try launching your scripts one by one and see what is the bottle neck with your instance. I assume that for some number of scripts running at the same time you will have good performance and it will start to degrade after you start adding even more. If you don't have high commit rate and you don't need NRT, disabling the caches shouldn't be needed and they can help with query performance. Also there are tools our there that can help you diagnose what the actual problem is, for example (http://sematext.com/spm/index.html). -- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ > This an example of a query: > http://myip:8080/solr/TestCatMatch_shard12_replica1/select?q=Royal+Cashmere+RC+106+CS+Silk+Cashmere+V+Neck+Moss+Green+Men > ^10+s+Sweater+Cashmere^3+Men^3+Sweaters^3+Clothing^3&rows=1&wt=json&indent=true > in return : > { > "responseHeader":{ > "status":0, > "QTime":191}, > > "response":{"numFound":4539784,"start":0,"maxScore":2.0123534,"docs":[ > { > "Sections":"fashion", > "IdsCategories":"11101911", > "IdProduct":"ef6b8d7cf8340d0c8935727a07baebab", > "Id":"11101911-ef6b8d7cf8340d0c8935727a07baebab", > "Name":"Uniqlo Men Cashmere V Neck Sweater Men Clothing > Sweaters Cashmere", > "_version_":1455419757424541696}] > }} > This query was executed when no script is running so the QTime is only > 191 ms, but it may take up to 3s when they are) > Of course it can be smaller or bigger and of course that affects the > execution time (the execution times I spoke of are the internal ones > returned by solr, not calculated by me). > And yes the CPU is fully used. > 2013/12/26 Rafał Kuć >> Hello! >> >> Different queries can have different execution time, that's why I >> asked about the details. When running the scripts, is Solr CPU fully >> utilized? To tell more I would like to see what queries are run >> against Solr from scripts. >> >> Do you have any information on network throughput between the server >> you are running scripts on and the Solr cluster? You wrote that the >> scripts are fine for 5 seconds and than they get slow. If your Solr >> cluster is not fully utilized I would take a look at the queries and >> what they return (ie. using faceting with facet.limit=-1) and seeing >> if the network is able to process those. >> >> -- >> Regards, >> Rafał Kuć >> Performance Monitoring * Log Analytics * Search Analytics >> Solr & Elasticsearch Support * http://sematext.com/ >> >> >> > Thanks Rafal for your reply, >> >> > My scripts are running on other independent machines so they does not >> > affect Solr, I did mention that the queries are not the same (that is >> why I >> > removed the query cache from solrconfig.xml), and I only get 1 result >> from >> > Solr (which is the top scored one so no sorting since it is by default >> > ordred by score) >> >> >> >> > 2013/12/26 Rafał Kuć >> >> >> Hello! >> >> >> >> Could you tell us more about your scripts? What they do? If the >> >> queries are the same? How many results you fetch with your scripts and >> >> so on. >> >> >> >> -- >> >> Regards, >> >> Rafał Kuć >> >> Performance Monitoring * Log Analytics * Search Analytics >> >> Solr & Elasticsearch Support * http://sematext.com/ >> >> >> >> >> >> > Hi all, >> >> >> >> > I have multiple python scripts querying solr with the sunburnt module. >> >> >> >> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB >> >> memory >> >> > & 840 GB storage) and contained several cores for different usage. >> >> >> >> > When I manually executed a query through Solr Admin (a query >> containing >> >> > 10~15 terms, with some of them having boosts over one field and >> limited >> >> to >> >> > one result without any sorting or faceting etc ) it takes around >> 700 >> >> > ms, and the Core contained 7 million documents. >> >> >> >> > When the scripts are executed things get slower, my query takes 7~10s. >> >> >> >> > Then what I did is to turn to SolrCloud expecting huge performance >> >> increase. >> >> >> >> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 >> vCPU >> >> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one >> >> collection >> >> > to contain the core I was querying, I sharded it to 25 shards (each >> node >> >> > containing 5 shards without replication), each shards took 54 MB of >> >> storage. >> >> >> >> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase >> wich >> >> > is very good ! >> >> >> >> > Tested my scripts again (I have 30 scripts running at the same time), >> and >> >> > as a surprise, things run fast for 5 seconds then it turns realy slow >> >> again >> >> > (query time ). >> >> >> >> > I updated the solrc
RE: Unable to check Solr 4.6 SPLITSHARD command progress
We ran into this exact scenario and resolved by applying SOLR-5214 (://issues.apache.org/jira/browse/SOLR-5214) From: binit [b.initth...@gmail.com] Sent: Friday, December 13, 2013 10:45 PM To: solr-user@lucene.apache.org Subject: Re: Unable to check Solr 4.6 SPLITSHARD command progress Yes, and my clusterstate.json is still: == "shards":{ "shard1":{ "range":"8000-7fff", "state":"active", "replicas":{"core_node1":{ "state":"active", "base_url":"http://./solr";, "core":".._shard1_replica1", "node_name":":8080_solr", "leader":"true"}}}, "shard1_1":{ "range":"0-7fff", "state":"construction", "parent":"shard1", "replicas":{"core_node2":{ "state":"active", "base_url":"http://:8080/solr";, "core":".._shard1_1_replica1", "node_name":":8080_solr", "leader":"true"}}}, "shard1_0":{ "range":"8000-", "state":"construction", "parent":"shard1", "replicas":{"core_node3":{ "state":"active", "base_url":"http://:8080/solr";, "core":".._shard1_0_replica1", "node_name":":8080_solr", "leader":"true", "maxShardsPerNode":"1", "router":{"name":"compositeId"}, "replicationFactor":"1"}} == But, it got failed finally with out of memory. And is definitely not progressing because the thread is stopped. Probably SPLITSHARD is not mature enough to use yet. Now, I've no choice but to do it from Solrj, indexing manually. -- View this message in context: http://lucene.472066.n3.nabble.com/Unable-to-check-Solr-4-6-SPLITSHARD-command-progress-tp4106520p4106699.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Possible memory leak after segment merge? (related to DocValues?)
Does anybody with knowledge of solr internals know why I'm seeing instances of Lucene42DocValuesProducer when I don't have any fields that are using DocValues? Or am I misunderstanding what this class is for? -Greg On Mon, Dec 23, 2013 at 12:07 PM, Greg Preston wrote: > Hello, > > I'm loading up our solr cloud with data (from a solrj client) and > running into a weird memory issue. I can reliably reproduce the > problem. > > - Using Solr Cloud 4.4.0 (also replicated with 4.6.0) > - 24 solr nodes (one shard each), spread across 3 physical hosts, each > host has 256G of memory > - index and tlogs on ssd > - Xmx=7G, G1GC > - Java 1.7.0_25 > - schema and solrconfig.xml attached > > I'm using composite routing to route documents with the same clientId > to the same shard. After several hours of indexing, I occasionally > see an IndexWriter go OOM. I think that's a symptom. When that > happens, indexing continues, and that node's tlog starts to grow. > When I notice this, I stop indexing, and bounce the problem node. > That's where it gets interesting. > > Upon bouncing, the tlog replays, and then segments merge. Once the > merging is complete, the heap is fairly full, and forced full GC only > helps a little. But if I then bounce the node again, the heap usage > goes way down, and stays low until the next segment merge. I believe > segment merges are also what causes the original OOM. > > More details: > > Index on disk for this node is ~13G, tlog is ~2.5G. > See attached mem1.png. This is a jconsole view of the heap during the > following: > > (Solr cloud node started at the left edge of this graph) > > A) One CPU core pegged at 100%. Thread dump shows: > "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800 > nid=0x7a74 runnable [0x7f5a41c5f000] >java.lang.Thread.State: RUNNABLE > at org.apache.lucene.util.fst.Builder.add(Builder.java:397) > at > org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000) > at > org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112) > at > org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72) > at > org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98) > at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) > > B) One CPU core pegged at 100%. Manually triggered GC. Lots of > memory freed. Thread dump shows: > "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800 > nid=0x7a74 runnable [0x7f5a41c5f000] >java.lang.Thread.State: RUNNABLE > at > org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127) > at > org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144) > at > org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92) > at > org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112) > at > org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119) > at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) > > C) One CPU core pegged at 100%. Manually triggered GC. No memory > freed. Thread dump shows: > "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800 > nid=0x7a74 runnable [0x7f5a41c5f000] >java.lang.Thread.State: RUNNABLE > at > org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127) > at > org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108) > at > org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92) > at > org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112) > at > org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119) > at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) > at o
Re: Solr Query Slowliness
On 12/26/2013 3:38 AM, Jilal Oussama wrote: > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB memory > & 840 GB storage) and contained several cores for different usage. > > When I manually executed a query through Solr Admin (a query containing > 10~15 terms, with some of them having boosts over one field and limited to > one result without any sorting or faceting etc ) it takes around 700 > ms, and the Core contained 7 million documents. > > When the scripts are executed things get slower, my query takes 7~10s. > > Then what I did is to turn to SolrCloud expecting huge performance increase. > > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one collection > to contain the core I was querying, I sharded it to 25 shards (each node > containing 5 shards without replication), each shards took 54 MB of storage. > > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich > is very good ! > > Tested my scripts again (I have 30 scripts running at the same time), and > as a surprise, things run fast for 5 seconds then it turns realy slow again > (query time ). > > I updated the solrconfig.xml to remove the query caches (I don't need them > since queries are very different and only 1 time queries) and changes the > index memory to 1 GB, but only got a small increase (3~4s for each query ?!) Your SolrCloud setup has 35 times as much CPU power (just basing this on the ECU numbers) as your single-server setup, ten times as much memory, and a lot more IOPS because you moved to SSD. A 10X increase in single query performance is not surprising. You have not indicated how much memory is assigned to the java heap on each server. I think that there are three possible problems happening here, with a strong possibility that the third one is happening at the same time as one of the other two: 1) Full garbage collections are too frequent because the heap is too small. 2) Garbage collections take too long because the heap is very large and GC is not tuned. 3) Extremely high disk I/O because the OS disk cache is too small for the index size. Some information on these that might be helpful: http://wiki.apache.org/solr/SolrPerformanceProblems The general solution for good Solr performance is to throw hardware, especially memory, at the problem. It's worth pointing out that any level of hardware investment has an upper limit on the total query volume it can support. Running 30 test scripts at the same time will be difficult for all but the most powerful and expensive hardware to deal with, especially if every query is different. A five-server cloud where each server has 8 CPU cores and 15GB of memory is pretty small, all things considered. Thanks, Shawn
Re: Maybe a bug for solr 4.6 when create a new core
Hi Mark. Thanks for your reply. I will file a JIRA issue about the NPE. By the way,would you look through the Question 2. After I create a new core with explicite shard and coreNodeName successfully,I can not create a replica for above new core also with explicite coreNodeName and the same shard and collection Request url as following: http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test1&shard=Test&collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol&coreNodeName=Test1 It responses an error: 400 29 Error CREATEing SolrCore 'Test1': Test1 is removed 400 I find out that in the src class in org.apache.solr.cloud. ZkController line 1369~ 1384: As the code says,when I indicate a coreNodeName and collection explicitly,it goes to check a 'autoCreated' property of the Collection which I have already created. My question :Why does it need to check the 'autoCreated' property,any jira about this 'autoCreated' property? How can I make through the check? [1]- try { if(cd.getCloudDescriptor().getCollectionName() !=null && cd.getCloudDescriptor().getCoreNodeName() != null ) { //we were already registered if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){ DocCollection coll = zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName()); if(!"true".equals(coll.getStr("autoCreated"))){ Slice slice = coll.getSlice(cd.getCloudDescriptor().getShardId()); if(slice != null){ if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName()) == null) { log.info("core_removed This core is removed from ZK"); throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName +" is removed"); } } } } } -- Regards 2013/12/27 Mark Miller > If you are seeing an NPE there, sounds like you are on to something. > Please file a JIRA issue. > > - Mark > > > On Dec 26, 2013, at 1:29 AM, YouPeng Yang > wrote: > > > > Hi > > Merry Christmas. > > > > Before this mail,I am in trouble with a weird problem for a few days > > when to create a new core with both explicite shard and coreNodeName. > And I > > have posted a few mails in the mailist,no one ever gives any > > suggestions,maybe they did not encounter the same problem. > > I have to go through the srcs to check out the reason. Thanks god, I > find > > it. The reason to the problem,maybe be a bug, so I would like to report > it > > hoping to get your endorsement and confirmation. > > > > > > In class org.apache.solr.cloud.Overseer the Line 360: > > - > > if (sliceName !=null && collectionExists && > > !"true".equals(state.getCollection(collection).getStr("autoCreated"))) { > >Slice slice = state.getSlice(collection, sliceName); > >if (slice.getReplica(coreNodeName) == null) { > > log.info("core_deleted . Just return"); > > return state; > >} > > } > > - > > the slice needs to be checked null .because I create a new core with both > > explicite shard and coreNodeName, the state.getSlice(collection, > > sliceName) may return a null.So it needs to be checked ,or there will be > > an NullpointException > > - > > if (sliceName !=null && collectionExists && > > !"true".equals(state.getCollection(collection).getStr("autoCreated"))) { > >Slice slice = state.getSlice(collection, sliceName); > >if (*slice != null &&* slice.getReplica(coreNodeName) == > null) { > > log.info("core_deleted . Just return"); > > return state; > >} > > } > > - > > > > *Querstion 1*: Is this OK with the whole solr project,I have no aware > > about the influences about the change,as right now ,it goes right. Please > > make confirm about this. > > > > After I fixed this prolem,I can create a core with the request: > > http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test&; > > *shard=Test* > > > &collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol& > > *coreNodeName=Test* > > > > However when I create a replica within the same shard Test: > > http://10.7.23.122:8080/solr/admin/cores?action=CREATE&*name=Test1*&; > > *shard=Test* > > > &collection
Re: Solr - Match whole word only in text fields
Hi everybody! Ahmet, do I get it correct - if I use this text_char_norm field type, for input "myName=aaa bbb" I'll index terms "myName", "aaa", "bbb"? So I'll match with query like "myName" or query like "bbb", but not match with "myName aaa". I can use this type for query value, so split "myName aaa" into ( "myName" && "aaa") - and it will work. But this approach will give false positive match with "myName bbb". What do you think, how I can handle this? One of the approaches is to use in this field type KeywordTokenizer+ShingleFilter instead of WhitespaceTokenizerFactory, so tokens like "myName", "myName aaa", "myName aaa bbb", "aaa", "aaa bbb", "bbb" will be indexed, but it significantly increased index size in case of long values. 26.12.2013, 03:20, "Ahmet Arslan" : > Hi Haya, > > With MappingCharFilter you can have full control over character set that you > want to split. > > in mappings.txt you will have > > ":" => " " > "=" => " " > > Use the following type and see if it suits for your needs. Update > mappings.txt according to your needs. > > positionIncrementGap="100" > > > mapping="mappings.txt"/> > > > > > > On Sunday, December 22, 2013 9:19 PM, haya.axelrod > wrote: > I have a text field that can contain very long values (like text files). I > want to create field type for it (text, not string), in order to have > something like "Match whole word only" in notepad++, but the delimiter > should not be only white spaces. If i have: > > myName=aaa bbb > > I would like to get it for the following search strings "aaa", "bbb", "aaa > bbb", "myName=aaa bbb", "myName", but not for "aa" or "ame=a" or "a bb". > Another example is: > > aaa bbb > Can i do this somehow? > > What should be my field type definition? > > The text can contain any character. Before search i'm escaping the search > string using > http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html > > Thanks > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Match-whole-word-only-in-text-fields-tp4107795.html > Sent from the Solr - User mailing list archive at Nabble.com.
REYPLAY_ERR: IOException reading log
Hi users I have build a SolrCloud on tomcat.The cloud contains 22 shards with no replica.Also the the solrcloud is integrated with HDFS. After imported data for oracle to the solrcloud, I restart the tomcat ,it does not comes alive againt. It always give an exceptions. I'm really have not aware about this excetion. Because My schema do not contains a BigDecimal type field. Could you give any tips? 746635 [recoveryExecutor-44-thread-1] WARN org.apache.solr.update.UpdateLog – REYPLAY_ERR: IOException reading log org.apache.solr.common.SolrException: Invalid Number: java.math.BigDecimal:238088174 at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396) at org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:98) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:582) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1313) at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1202) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) 746681 [recoveryExecutor-44-thread-1] WARN org.apache.solr.update.UpdateLog – REYPLAY_ERR: IOException reading log org.apache.solr.common.SolrException: Invalid Number: java.math.BigDecimal:238088175
Re: Maybe a bug for solr 4.6 when create a new core
Hi Mark I have filed a jira about the NPE: https://issues.apache.org/jira/browse/SOLR-5580 2013/12/27 YouPeng Yang > Hi Mark. > >Thanks for your reply. > > I will file a JIRA issue about the NPE. > >By the way,would you look through the Question 2. After I create a new > core with explicite shard and coreNodeName successfully,I can not create a > replica for above new core also with explicite coreNodeName and the same > shard and collection > Request url as following: > > http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test1&shard=Test&collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol&coreNodeName=Test1 > > It responses an error: > > > >400 > 29 > > > Error CREATEing SolrCore 'Test1': Test1 is > removed > 400 > > > >I find out that in the src class in org.apache.solr.cloud. > ZkController line 1369~ 1384: >As the code says,when I indicate a coreNodeName and collection > explicitly,it goes to check a 'autoCreated' property of the Collection > which I have already created. > > My question :Why does it need to check the 'autoCreated' property,any > jira about this 'autoCreated' property? How can I make through the check? > > > [1]- > try { > if(cd.getCloudDescriptor().getCollectionName() !=null && > cd.getCloudDescriptor().getCoreNodeName() != null ) { > //we were already registered > > > > if(zkStateReader.getClusterState().hasCollection(cd.getCloudDescriptor().getCollectionName())){ > DocCollection coll = > > > zkStateReader.getClusterState().getCollection(cd.getCloudDescriptor().getCollectionName()); > if(!"true".equals(coll.getStr("autoCreated"))){ >Slice slice = > coll.getSlice(cd.getCloudDescriptor().getShardId()); >if(slice != null){ > if(slice.getReplica(cd.getCloudDescriptor().getCoreNodeName()) > == null) { >log.info("core_removed This core is removed from ZK"); >throw new SolrException(ErrorCode.NOT_FOUND,coreNodeName +" > is removed"); > } >} > } > } > } > > > -- > > > Regards > > > 2013/12/27 Mark Miller > >> If you are seeing an NPE there, sounds like you are on to something. >> Please file a JIRA issue. >> >> - Mark >> >> > On Dec 26, 2013, at 1:29 AM, YouPeng Yang >> wrote: >> > >> > Hi >> > Merry Christmas. >> > >> > Before this mail,I am in trouble with a weird problem for a few days >> > when to create a new core with both explicite shard and coreNodeName. >> And I >> > have posted a few mails in the mailist,no one ever gives any >> > suggestions,maybe they did not encounter the same problem. >> > I have to go through the srcs to check out the reason. Thanks god, I >> find >> > it. The reason to the problem,maybe be a bug, so I would like to report >> it >> > hoping to get your endorsement and confirmation. >> > >> > >> > In class org.apache.solr.cloud.Overseer the Line 360: >> > - >> > if (sliceName !=null && collectionExists && >> > !"true".equals(state.getCollection(collection).getStr("autoCreated"))) { >> >Slice slice = state.getSlice(collection, sliceName); >> >if (slice.getReplica(coreNodeName) == null) { >> > log.info("core_deleted . Just return"); >> > return state; >> >} >> > } >> > - >> > the slice needs to be checked null .because I create a new core with >> both >> > explicite shard and coreNodeName, the state.getSlice(collection, >> > sliceName) may return a null.So it needs to be checked ,or there will >> be >> > an NullpointException >> > - >> > if (sliceName !=null && collectionExists && >> > !"true".equals(state.getCollection(collection).getStr("autoCreated"))) { >> >Slice slice = state.getSlice(collection, sliceName); >> >if (*slice != null &&* slice.getReplica(coreNodeName) == >> null) { >> > log.info("core_deleted . Just return"); >> > return state; >> >} >> > } >> > - >> > >> > *Querstion 1*: Is this OK with the whole solr project,I have no aware >> > about the influences about the change,as right now ,it goes right. >> Please >> > make confirm about this. >> > >> > After I fixed this prolem,I can create a core with the request: >> > http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test&; >> >