Is it possible to extract all the tokens from solr?
Hello Everyone, How to extract all the tokens from solr, not from one document but from all the documents indexed in solr? - Thanks
Grouping and recip function not working with Sharding
Hi, I am using sharding (3 shards) with Zookeeper. When I query a collection using " *group=true&group.field=NAME&group.ngroups=true*" parameters, "*ngroups*" in response is incorrect. However I am getting correct count in doclist array. Ex: Below response contains 5 groups (Which is correct) but ngroups is 11. { "responseHeader":{ "status":0, "QTime":49, "params":{ "group.ngroups":"true", "indent":"true", "start":"0", "q":"*:*", "group.field":"NAME", "group":"true", "wt":"json", "rows":"5" } }, "grouped":{ "NAME":{ "matches":18, "ngroups":11, "groups":[ { "groupValue":"A-SERIES", "doclist":{ "numFound":5, "start":0, "maxScore":1, "docs":[ { "NAME":"A-SERIES", "_version_":1505559209034383400 } ] } }, { "groupValue":"B-SERIES", "doclist":{ "numFound":5, "start":0, "docs":[ { "NAME":"B-SERIES", "_version_":1505559209034383400 } ] } }, { "groupValue":"C-SERIES", "doclist":{ "numFound":1, "start":0, "docs":[ { "NAME":"C-SERIES", "_version_":1505559209034383400 } ] } }, { "groupValue":"D-SERIES", "doclist":{ "numFound":5, "start":0, "docs":[ { "NAME":"D-SERIES", "_version_":1505559209034383400 } ] } }, { "groupValue":"E-SERIES", "doclist":{ "numFound":3, "start":0, "maxScore":1, "docs":[ { "NAME":"E-SERIES", "_version_":1505559209034383400 } ] } } ] } } } I am facing same problem with Recip function to get latest record on some date field when using sharding. It returns back records in wrong order. Note: Same configuration works fine on single machine without sharding. Please Help me to find solution. Thanks.
Re: Grouping and recip function not working with Sharding
Erick Erickson gmail.com> writes: > > From the reference guide: > > group.ngroups and group.facet require that all documents in each group > must be co-located on the same shard in order for accurate counts to > be returned. Document routing via composite keys can be a useful > solution in many situations. > > It's not clear what you think the prolbem here is. You say: > bq: Ex: Below response contains 5 groups (Which is correct) but > ngroups is 11. But you have rows set to 5 so? > > As far as your sorting issue, again an example showing what you think > is wrong would be very helpful. > > Best, > Erick > > On Wed, Jul 8, 2015 at 6:38 AM, Pankaj Sonawane > gmail.com> wrote: > > Hi, > > > > I am using sharding (3 shards) with Zookeeper. > > > > When I query a collection using " > > *group=true&group.field=NAME&group.ngroups=true*" parameters, "*ngroups*" in > > response is incorrect. However I am getting correct count in doclist array. > > > > Ex: Below response contains 5 groups (Which is correct) but ngroups is 11. > > > > { > >"responseHeader":{ > > "status":0, > > "QTime":49, > > "params":{ > > "group.ngroups":"true", > > "indent":"true", > > "start":"0", > > "q":"*:*", > > "group.field":"NAME", > > "group":"true", > > "wt":"json", > > "rows":"5" > > } > >}, > >"grouped":{ > > "NAME":{ > > "matches":18, > > "ngroups":11, > > "groups":[ > > { > >"groupValue":"A-SERIES", > >"doclist":{ > > "numFound":5, > > "start":0, > > "maxScore":1, > > "docs":[ > > { > > "NAME":"A-SERIES", > > "_version_":1505559209034383400 > > } > > ] > >} > > }, > > { > >"groupValue":"B-SERIES", > >"doclist":{ > > "numFound":5, > > "start":0, > > "docs":[ > > { > > "NAME":"B-SERIES", > > "_version_":1505559209034383400 > > } > > ] > >} > > }, > > { > >"groupValue":"C-SERIES", > >"doclist":{ > > "numFound":1, > > "start":0, > > "docs":[ > > { > > "NAME":"C-SERIES", > > "_version_":1505559209034383400 > > } > > ] > >} > > }, > > { > >"groupValue":"D-SERIES", > >"doclist":{ > > "numFound":5, > > "start":0, > > "docs":[ > > { > > "NAME":"D-SERIES", > > "_version_":1505559209034383400 > > } > > ] > >} > > }, > > { > >"groupValue":"E-SERIES", > >"doclist":{ > > "numFound":3, > > "start":0, > > "maxScore":1, > > "docs":[ > > { > > "NAME":"E-SERIES", > > "_version_":1505559209034383400 > >
Re: Grouping and recip function not working with Sharding
Hi Erick, Below example is for grouping issue not for sorting. I have indexed 1839 records with 'NAME' field in all, There may be duplicate record for each 'NAME' value. Let say There are 5 records with NAME='A-SERIES',similarly 3 records with NAME='E-SERIES' etc. I have total 264 unique NAME values. So when I query collection using grouping it should return 264 unique groups with "ngroups" value as 264. But query returns response with "ngroups" as 558, however length of "groups" array in response is 264. { "responseHeader":{ "status":0, "QTime":19, "params":{ "group.ngroups":"true", "indent":"true", "q":"*:*", "group.field":"NAME", "group":"true", "wt":"json" } }, "grouped":{ "NAME":{ "matches":1839, "ngroups":558, - This value should be 264 "groups":[ { "groupValue":"A-SERIES", "doclist":{ } }, { "groupValue":"B-SERIES", "doclist":{ } }, { "groupValue":"C-SERIES", "doclist":{ } }, ---Similarly there are total 264 such groups ] } } } >From the reference guide: group.ngroups and group.facet require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. Document routing via composite keys can be a useful solution in many situations. It's not clear what you think the prolbem here is. You say: bq: Ex: Below response contains 5 groups (Which is correct) but ngroups is 11. But you have rows set to 5 so? As far as your sorting issue, again an example showing what you think is wrong would be very helpful. Best, Erick On Wed, Jul 8, 2015 at 6:38 AM, Pankaj Sonawane wrote: > Hi, > > I am using sharding (3 shards) with Zookeeper. > > When I query a collection using " > *group=true&group.field=NAME&group.ngroups=true*" parameters, "*ngroups*" in > response is incorrect. However I am getting correct count in doclist array. > > Ex: Below response contains 5 groups (Which is correct) but ngroups is 11. > > { >"responseHeader":{ > "status":0, > "QTime":49, > "params":{ > "group.ngroups":"true", > "indent":"true", > "start":"0", > "q":"*:*", > "group.field":"NAME", > "group":"true", > "wt":"json", > "rows":"5" > } >}, >"grouped":{ > "NAME":{ > "matches":18, > "ngroups":11, > "groups":[ > { >"groupValue":"A-SERIES", >"doclist":{ > "numFound":5, > "start":0, > "maxScore":1, > "docs":[ > { > "NAME":"A-SERIES", > "_version_":1505559209034383400 > } > ] >} > }, > { >"groupValue":"B-SERIES", >"doclist":{ > "numFound":5, > "start":0, > "docs":[ > { > "NAME":"B-SERIES", > "_version_":1505559209034383400 > } > ] >} > }, > { >"groupValue":"C-SERIES", >"doclist":{ > "numFound":1, > "start":0, > "docs":[ > { > "NAME":"C-SERIES", > "_version_":1505559209034383400 >
Adding UniqueKey to an existing Solr 6.4 Index
Hello, I have a single node Solr 6.4 server, with a Index of 100 Million documents. The default "id" is the primary key of this index. Now, I would like to setup an update process to insert new documents, and update existing documents based on availability of value in another field (say ProductId), that is different from the default "id". Now, to ensure that I use the Solr provided De-Duplication method by having a new field SignatureField using the ProductId as UniqueKey. Considering the millions of documents I have, I would like to ask if its possible to setup a De-Duplication mechanism in an existing solr index with the following steps: a. Add new field SignatureField, and configure it as UniqueKey in Solr schema. b.Run an Atomic Update process on all documents, to update the value of this new field SignatureField. Is there an easier/better way to add a SignatureField to an existing large index? Thx, Pankaj
Susbcribe
Solr limiting number of rows to indexed to 21500 every time.
dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)* *at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)* *at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)* *at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)* *at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)* *at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)* *at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)* *at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)* *at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)* *Caused by: java.sql.SQLRecoverableException: No more data to read from socket* *at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1200)* *at oracle.jdbc.driver.T4CMAREngine.unmarshalCLR(T4CMAREngine.java:1865)* *at oracle.jdbc.driver.T4CMAREngine.unmarshalCLR(T4CMAREngine.java:1757)* *at oracle.jdbc.driver.T4CMAREngine.unmarshalCLR(T4CMAREngine.java:1750)* *at oracle.jdbc.driver.T4CClobAccessor.handlePrefetch(T4CClobAccessor.java:543)* *at oracle.jdbc.driver.T4CClobAccessor.unmarshalOneRow(T4CClobAccessor.java:197)* *at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:916)* *at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:835)* *at oracle.jdbc.driver.T4C8Oall.readRXD(T4C8Oall.java:664)* *at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:328)* *at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:186)* *at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:521)* *at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:194)* *at oracle.jdbc.driver.T4CStatement.fetch(T4CStatement.java:1074)* *at oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:369)* *at oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:273)* *at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:370)* *... 12 more* *db-data-config.xml:* <![CDATA[ function makePair(row) { var theKey = row.get("theKey")+ "_s"; var theValue = row.get("theValue"); row.put(theKey, theValue); row.remove("theKey"); row.remove("theValue"); return row; } ]]> Please Help me to resolve this issue. Thanks, Pankaj
Index string returned by 'splitby' by further splitting instead of multivalue
Hi, I am using Solr DataImportHandler to index data from database table(Oracle). One of the column contains String of ='s and ','s (Please column3 in example below) Like Column1 = "F" Column2 = "ASDF" *Column3 = "A=1,B=2,C=3,D=4..Z=26"* I want solr to index each 'alphabet' against its value *expected*. JSON for 1 row "docs": [ { "COL1": "F", "COL2": "ASDF", "A_s": "1", "B_s": "2", "C_s": "3", . . . * }* // appending '_s' to 'name' attribute for making dynamic fields. But using RegexTransformer and 'splitBy', I can only split string by ',' . I further want to split it by '=' . *Actual:* "docs": [ { "COL1": "F", "COL2": "ASDF", "COL3": [ "A=1", "B=2", "C=3", . . . "Z=26" ] * }* *db-data-config.xml:* Please Help me to find solution. Thanks, Pankaj
Indexing documents with SOLR
Hi All, I am a newbie to SOLR and trying to integrate TIKA + SOLR. Can anyone please guide me, how to achieve this. * My Req is:* I have a directory containing a lot of PDF,DOC's and i need to make a search within the documents. I am using SOLR web application. I just need some sample xml code both for solr-config.xml and the directory-schema.xml Awaiting eagerly for your response. Regards, Pankaj Bhatt.
PDFBOX 1.3.1 Parsing Error
hi All, While using PDFBOX 1.3.1 in APACHE TIKA 1.7 i am getting the following error to parse an PDF Document. *Error: Expected an integer type, actual='' " at org.apache.pdfbox.pdfparser.BaseParser.readInt* * * This error occurs, because of SHA-256 Encryption used by Adobe Acrobat 9. is there is any solution to this problem??? I get stuck because of this approoach. In Jira Issue-697 has been created against this. https://issues.apache.org/jira/browse/PDFBOX-697 Please help!! / Pankaj Bhatt.
Re: facet.pivot for date fields
Hi Adeel, You can make use of facet.query attribute to make the Faceting work across a range of dates. Here i am using the duration, just replace the field with a field date and Range values as the DATE in SOLR Format. so your query parameter will be like this ( you can pass multiple parameter of "facet.query" name) http//blasdsdfsd/q?=asdfasd&facet.query=itemduration:[0 To 49]&facet.query=itemduration:[50 To 99]&facet.query=itemduration:[100 To 149] Hope, it helps. / Pankaj Bhatt. On Wed, Dec 15, 2010 at 2:01 AM, Adeel Qureshi wrote: > It doesnt seems like pivot facetting works on dates .. I was just curious > if > thats how its supposed to be or I am doing something wrong .. if I include > a > datefield in the pivot list .. i simply dont get any facet results back for > that datefield > > Thanks > Adeel >
Re: Problem using curl in PHP to get Solr results
HI , On Wed, Dec 15, 2010 at 2:52 PM, Dennis Gearon wrote: > I finally figured out how to use curl to GET results, i.e. just turn all > spaces > into '%20' in my type of queries. I'm using solar spatial, and then > searching in > both the default text field and a couple of columns. Works fine on in the > browser. > > But if I query for it using curl in PHP, there's an error somewhere in the > JSON. > I don't know if it's in the PHP food chain or something else. > > > Just putting my solution to GETing from curl in PHP and my problem up here, > for > others to find. > > Of course, if anyone knows the answer, all the better. > > Dennis Gearon > > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better > idea to learn from others’ mistakes, so you do not have to make them > yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. > >
Re: uuid, COMB uuid, distributed farms
Hi Dennis, I have used UUID's in my project to identify a basic installation of the client. Can i be of any help. / Pankaj Bhatt. On Mon, Jan 3, 2011 at 3:28 AM, Dennis Gearon wrote: > Planning ahead here. > > Anyone have experience with UUIDs, COMB UUIDs (sequential) in large, > internatiionally distributed Solr/Database project. > > Dennis Gearon > > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better > idea to learn from others’ mistakes, so you do not have to make them > yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. > >
Re: uuid, COMB uuid, distributed farms
HI Dennis, I have used UUID in context of an application where an installation id (UUID) is generated by the code. It caters to around 10K users. I have not used it in context of SOLR. / Pankaj Bhatt. On Mon, Jan 3, 2011 at 11:05 PM, Dennis Gearon wrote: > Thank you Pankaj. > > How large was your installation of Solr? I'm hoping to get mine to be > multinational and making plans for that as I go. So having unique ids, > UUIDs, > that cover a huge addressable space is a requirement. > > If your's was comparable, how were your replication issues, merging issues, > anthing else related to getting large datasets searchable and unique? > > Dennis Gearon > > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better > idea to learn from others’ mistakes, so you do not have to make them > yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. > > > > - Original Message > From: pankaj bhatt > To: solr-user@lucene.apache.org; gear...@sbcglobal.ne > Sent: Mon, January 3, 2011 8:55:21 AM > Subject: Re: uuid, COMB uuid, distributed farms > > Hi Dennis, > > I have used UUID's in my project to identify a basic installation of > the client. >Can i be of any help. > > / Pankaj Bhatt. > > On Mon, Jan 3, 2011 at 3:28 AM, Dennis Gearon > wrote: > > > Planning ahead here. > > > > Anyone have experience with UUIDs, COMB UUIDs (sequential) in large, > > internatiionally distributed Solr/Database project. > > > > Dennis Gearon > > > > > > Signature Warning > > > > It is always a good idea to learn from your own mistakes. It is usually a > > better > > idea to learn from others’ mistakes, so you do not have to make them > > yourself. > > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > > > > EARTH has a Right To Life, > > otherwise we all die. > > > > > >
Re: [sqljdbc4.jar] Errors
Hi Adam, Can you try by downgrading your Java version to java 5. However i am using Java 6u13 with sqljdbc4.jar , i however do not get any error. If possible, can you pleease also try with some other version of Java 6. / Pankaj Bhatt. On Wed, Jan 5, 2011 at 5:01 AM, Adam Estrada wrote: > Can anyone help me with the following error. I upgraded my database to SQL > Server 2008 SP2 and now I get the following error. It was working with SQL > Server 2005. > > > Caused by: java.lang.UnsupportedOperationException: Java Runtime > Environment > (JR > E) version 1.6 is not supported by this driver. Use the sqljdbc4.jar class > libra > ry, which provides support for JDBC 4.0. > > Any tips on this would be great! > > Thanks, > Adam >
Indexing FTP Documents through SOLR??
Hi All, Is there is any way in SOLR or any plug-in through which the folders and documents in FTP location can be indexed. / Pankaj Bhatt.
Re: Indexing FTP Documents through SOLR??
Hi Gora, Thanks for the answer. I want to index all the PDF,HTML documents lying within a tree hierarchy at FTP Server. In addition, can i add an attribute "location" whose value is the FTP FILE LOCATION. If you can give me, the sample configuration, it will be great. / Pankaj Bhatt. On Fri, Jan 21, 2011 at 12:57 PM, Gora Mohanty wrote: > On Fri, Jan 21, 2011 at 12:21 PM, pankaj bhatt wrote: > > Hi All, > > Is there is any way in SOLR or any plug-in through which the folders and > > documents in FTP location can be indexed. > [...] > > What format are these documents in? Which parts of the documents > do you want to index? > > In general, this can be done through Solr, but the details will depend > on the above. > > Regards, > Gora >
Re: Indexing FTP Documents through SOLR??
Hi Gora, Thanks, however i think it would be a cumbersome process, to do all this manual. Aren't there any plugin or extracter does this automatically.??? Anyone in the group, if had done this previously.? / Pankaj Bhatt. On Fri, Jan 21, 2011 at 1:41 PM, Gora Mohanty wrote: > On Fri, Jan 21, 2011 at 1:31 PM, pankaj bhatt wrote: > > Hi Gora, > > Thanks for the answer. I want to index all the PDF,HTML > documents > > lying within a tree hierarchy at FTP Server. > > In addition, can i add an attribute "location" whose value is the > FTP > > FILE LOCATION. > > > > If you can give me, the sample configuration, it will be great. > [...] > > From Solr 1.4 onwards, you can use the ExtractingRequestHandler > built into Solr, and simply POST such files to a Solr Server. > > Please see http://wiki.apache.org/solr/ExtractingRequestHandler > > Regards, > Gora >
DIH From various File system locations
Hi All, I need to index the documents presents in my file system at various locations (e.g. C:\docs , d:\docs ). Is there any way through which i can specify this in my DIH Configuration. Here is my configuration:- / Pankaj Bhatt.
Re: DIH From various File system locations
Thanks Adam, It seems like Nutch use to solve most of my concerns. i would be great if you can have share resources for Nutch with us. / Pankaj Bhatt. On Tue, Jan 25, 2011 at 7:21 PM, Estrada Groups < estrada.adam.gro...@gmail.com> wrote: > I would just use Nutch and specify the -solr param on the command line. > That will add the extracted content your instance of solr. > > Adam > > Sent from my iPhone > > On Jan 25, 2011, at 5:29 AM, pankaj bhatt wrote: > > > Hi All, > > I need to index the documents presents in my file system at > various > > locations (e.g. C:\docs , d:\docs ). > >Is there any way through which i can specify this in my DIH > > Configuration. > >Here is my configuration:- > > > > > > >processor="FileListEntityProcessor" > >fileName="docx$|doc$|pdf$|xls$|xlsx|html$|rtf$|txt$|zip$" > > *baseDir="G:\\Desktop\\"* > >recursive="false" > >rootEntity="true" > >transformer="DateFormatTransformer" > > onerror="continue"> > > > processor="org.apache.solr.handler.dataimport.TikaEntityProcessor" > > url="${sd.fileAbsolutePath}" format="text" dataSource="bin"> > > > > > > > > > > > > > > > > > > > > > > > > > > > > / Pankaj Bhatt. >
Search Result from Multiple Cores
Hi All, Can anyone please help me in getting the results from multiple cores ( all cores maintain there separate indexes, no sharding). Suppose I had three cores: - Core A ( SQL Server DB), Core B ( FileSystem), Core C (MySQL) if i search for word "Java" then the combined results from all the cores should come up ordering by their ranking. I even found a patch for this in JIRA but the issue is still unresolved. Can any of you, please help me. / Pankaj
Re: adding a document using curl
Hi All, is there any Custom open source SOLR ADMIN application like what lucid imagination provides in its distribution. I am trying to create thing, however thinking it would be a reinventing of wheel. Request you to please redirect me, if there is any open source application that can be used. Waiting for your answer. / Pankaj Bhatt.
Custom SOLR ADMIN Application
Hi All, is there any Custom open source SOLR ADMIN application like what lucid imagination provides in its distribution. I am trying to create thing, however thinking it would be a reinventing of wheel. Request you to please redirect me, if there is any open source application that can be used. Waiting for your answer. / Pankaj Bhatt.
Unsubsribe
On Thu, Apr 14, 2011 at 1:32 PM, Stephan Raemy wrote: > > > Sent from my iPhone >
Re: POST VS GET and NON English Characters
Hi Arun, This looks like an Encoding issue to me. Can you change your browser settinsg to UTF-8 and hit the search url via GET method. We faced the similar problem with chienese,korean languages, this solved the problem. / Pankaj Bhatt. 2011/7/15 Sujatha Arun > Hello, > > We have implemented solr search in several languages .Intially we used the > "GET" method for querying ,but later moved to "POST" method to accomodate > lengthy queries . > > When we moved form GET TO POSt method ,the german characteres could no > longer be searched and I had to use the fucntion utf8_decode in my > application for the search to work for german characters. > > Currently I am doing this while quering using the POST method ,we are > using > the standard Request Handler > > > $this->_queryterm=iconv("UTF-8", "ISO-8859-1//TRANSLIT//IGNORE", > $this->_queryterm); > > > This makes the query work for german characters and other languages but > does > not work for certain charactes in Lithuvanian and spanish.Example: > *Not working > > - *Iš > - Estremadūros > - sNaująjį > - MEDŽIAGOTYRA > - MEDŽIAGOS > - taškuose > > *Working > > - *garbę > - ieškoti > - ispanų > > Any ideas /input ? > > Regards > Sujatha >