Re: SOLR 4 not utilizing multi CPU cores
hi Salman, i getting one problem in solr 4.6 i have upgrade solr 1.4 to solr 4.6 because of i want to display search term count, and term count getting by solr term frequency but when i search only single word than its work fine i get perfect count but when i search multiple word within double quote it returning 0 count below is my code: termfreq(datafield, "Research") its working fine termfreq(datafield, "Research Development") its return 0 but multiple document have the same word. i have try with different field type : text_gen, text_en_splitting, String but i didnt get exact result can you please help for this. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-not-utilizing-multi-CPU-cores-tp4105058p4133256.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: get term frequency, just only keywords search
Hi, jack i have a same problem as danielitos85 i want to search like "research development" but termfreq function not work as per your messages and you said that use phraseFreq but we can get it from debug query. my problem is i want to sort on "research development" count, higher count document will display first in list. so how can i sort on that. can you please help me asap. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/get-term-frequency-just-only-keywords-search-tp4084510p4133260.html Sent from the Solr - User mailing list archive at Nabble.com.
How to sort solr results by foreign id field
I have documents with the following fields: id name parent color The parent field is an ID of another document. I want to select all documents where the color is red and sort the results by the name of the parent. Can it be done in solr? - I am a student IT -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-sort-solr-results-by-foreign-id-field-tp4133263.html Sent from the Solr - User mailing list archive at Nabble.com.
'0' Status: Communication Error
I've got this problem that I can't solve. Partly because I can't explain it with the right terms. I'm new to this so sorry for this clumsy question. Below you can see an overview of my goal. I'm using Magento CE1.7.0.2 & Solr 4.6.0. I'm using Magentix/Solr extension in Magento CE1.7.0.2 its working fine i can able get the response in max of 2secs. (Here i place Solr Server in near to My Magento) But i placed my Solr in separate server i don't want to place all these thing in one server. Enable Search : Yes Enable Index : Yes Host : IP address of Solr file existing server Port : 8983 Path : /solr Search limit : 100 But in solr logs its not giving any log details but actually that should give some log details & time taken for re-indexing dataetc And in Solr.log file its giving ERR (3): '0' Status: Communication Error.. Any thing wrong i did here ? -- View this message in context: http://lucene.472066.n3.nabble.com/0-Status-Communication-Error-tp4133265.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH issues with 4.7.1
Hi I have just compare the difference between the version 4.6.0 and 4.7.1. Notice that the time in the getConnection function is declared with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). Curious about the resson for the change.the benefit of it .Is it neccessory? I have read the SOLR-5734 , https://issues.apache.org/jira/browse/SOLR-5734 Do some google about the difference of currentTimeMillis and nano,but still can not figure out it. 2014-04-26 2:24 GMT+08:00 Shawn Heisey : > On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: > >> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH >> process that we are using takes 4x as long to complete. The only odd >> thing I notice is when I enable debug logging for the dataimporthandler >> process, it appears that in the new version each sql query is resulting in >> a new connection opened through jdbcdatasource (log: >> http://pastebin.com/JKh4gpmu). Were there any changes that would affect >> the speed of running a full import? >> > > This is most likely the problem you are experiencing: > > https://issues.apache.org/jira/browse/SOLR-5954 > > The fix will be in the new 4.8 version. The release process for 4.8 is > underway right now. A second release candidate was required yesterday. If > no further problems are encountered, the release should be made around the > middle of next week. If problems are encountered, the release will be > delayed. > > Here's something very important that has been mentioned before: Solr 4.8 > will require Java 7. Previously, Java 6 was required. Java 7u55 (the > current release from Oracle as I write this) is recommended as a minimum. > > If a 4.7.3 version is built, this is a fix that we should backport. > > Thanks, > Shawn > >
Re: DIH issues with 4.7.1
Hello! Look at the javadocs for both. The granularity of System.currentTimeMillis() depend on the operating system, so it may happen that calls to that method that are 1 millisecond away from each other still return the same value. This is not the case with System.nanoTime() - http://docs.oracle.com/javase/7/docs/api/java/lang/System.html -- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ > Hi >I have just compare the difference between the version 4.6.0 and 4.7.1. > Notice that the time in the getConnection function is declared with the > System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). > Curious about the resson for the change.the benefit of it .Is it > neccessory? >I have read the SOLR-5734 , > https://issues.apache.org/jira/browse/SOLR-5734 >Do some google about the difference of currentTimeMillis and nano,but > still can not figure out it. > 2014-04-26 2:24 GMT+08:00 Shawn Heisey : >> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: >> >>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH >>> process that we are using takes 4x as long to complete. The only odd >>> thing I notice is when I enable debug logging for the dataimporthandler >>> process, it appears that in the new version each sql query is resulting in >>> a new connection opened through jdbcdatasource (log: >>> http://pastebin.com/JKh4gpmu). Were there any changes that would affect >>> the speed of running a full import? >>> >> >> This is most likely the problem you are experiencing: >> >> https://issues.apache.org/jira/browse/SOLR-5954 >> >> The fix will be in the new 4.8 version. The release process for 4.8 is >> underway right now. A second release candidate was required yesterday. If >> no further problems are encountered, the release should be made around the >> middle of next week. If problems are encountered, the release will be >> delayed. >> >> Here's something very important that has been mentioned before: Solr 4.8 >> will require Java 7. Previously, Java 6 was required. Java 7u55 (the >> current release from Oracle as I write this) is recommended as a minimum. >> >> If a 4.7.3 version is built, this is a fix that we should backport. >> >> Thanks, >> Shawn >> >>
Re: DIH issues with 4.7.1
Hi Mark Miller Sorry to get you in these discussion . I notice that Mark Miller report this issure in https://issues.apache.org/jira/browse/SOLR-5734 according to https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with the zookeeper. If I just do DIH with JDBCDataSource ,I do not think it will get the problem. Please give some hints >> Bonus,just post the last mail I send about the problem: I have just compare the difference between the version 4.6.0 and 4.7.1. Notice that the time in the getConnection function is declared with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). Curious about the resson for the change.the benefit of it .Is it neccessory? I have read the SOLR-5734 , https://issues.apache.org/jira/browse/SOLR-5734 Do some google about the difference of currentTimeMillis and nano,but still can not figure out it. Thank you very much. 2014-04-26 20:31 GMT+08:00 YouPeng Yang : > Hi >I have just compare the difference between the version 4.6.0 and 4.7.1. > Notice that the time in the getConnection function is declared with the > System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). > Curious about the resson for the change.the benefit of it .Is it > neccessory? >I have read the SOLR-5734 , > https://issues.apache.org/jira/browse/SOLR-5734 >Do some google about the difference of currentTimeMillis and nano,but > still can not figure out it. > > > > > 2014-04-26 2:24 GMT+08:00 Shawn Heisey : > > On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: >> >>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH >>> process that we are using takes 4x as long to complete. The only odd >>> thing I notice is when I enable debug logging for the dataimporthandler >>> process, it appears that in the new version each sql query is resulting >>> in >>> a new connection opened through jdbcdatasource (log: >>> http://pastebin.com/JKh4gpmu). Were there any changes that would affect >>> the speed of running a full import? >>> >> >> This is most likely the problem you are experiencing: >> >> https://issues.apache.org/jira/browse/SOLR-5954 >> >> The fix will be in the new 4.8 version. The release process for 4.8 is >> underway right now. A second release candidate was required yesterday. If >> no further problems are encountered, the release should be made around the >> middle of next week. If problems are encountered, the release will be >> delayed. >> >> Here's something very important that has been mentioned before: Solr 4.8 >> will require Java 7. Previously, Java 6 was required. Java 7u55 (the >> current release from Oracle as I write this) is recommended as a minimum. >> >> If a 4.7.3 version is built, this is a fix that we should backport. >> >> Thanks, >> Shawn >> >> >
Re: DIH issues with 4.7.1
Hi Rafał Kuć I got it,the point is many operating systems measure time in units of tens of milliseconds,and the System.currentTimeMillis() is just base on operating system. In my case,I just do DIH with a crontable, Is there any possiblity to get in that trouble?I am really can not picture what the situation may lead to the problem. Thanks very much. 2014-04-26 20:49 GMT+08:00 YouPeng Yang : > Hi Mark Miller > Sorry to get you in these discussion . > I notice that Mark Miller report this issure in > https://issues.apache.org/jira/browse/SOLR-5734 according to > https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with > the zookeeper. > If I just do DIH with JDBCDataSource ,I do not think it will get the > problem. > Please give some hints > > >> Bonus,just post the last mail I send about the problem: > >I have just compare the difference between the version 4.6.0 and 4.7.1. > Notice that the time in the getConnection function is declared with the > System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). > Curious about the resson for the change.the benefit of it .Is it > neccessory? >I have read the SOLR-5734 , > https://issues.apache.org/jira/browse/SOLR-5734 >Do some google about the difference of currentTimeMillis and nano,but > still can not figure out it. > > Thank you very much. > > > 2014-04-26 20:31 GMT+08:00 YouPeng Yang : > > Hi >>I have just compare the difference between the version 4.6.0 and >> 4.7.1. Notice that the time in the getConnection function is declared >> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >> Curious about the resson for the change.the benefit of it .Is it >> neccessory? >>I have read the SOLR-5734 , >> https://issues.apache.org/jira/browse/SOLR-5734 >>Do some google about the difference of currentTimeMillis and nano,but >> still can not figure out it. >> >> >> >> >> 2014-04-26 2:24 GMT+08:00 Shawn Heisey : >> >> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: >>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH process that we are using takes 4x as long to complete. The only odd thing I notice is when I enable debug logging for the dataimporthandler process, it appears that in the new version each sql query is resulting in a new connection opened through jdbcdatasource (log: http://pastebin.com/JKh4gpmu). Were there any changes that would affect the speed of running a full import? >>> >>> This is most likely the problem you are experiencing: >>> >>> https://issues.apache.org/jira/browse/SOLR-5954 >>> >>> The fix will be in the new 4.8 version. The release process for 4.8 is >>> underway right now. A second release candidate was required yesterday. If >>> no further problems are encountered, the release should be made around the >>> middle of next week. If problems are encountered, the release will be >>> delayed. >>> >>> Here's something very important that has been mentioned before: Solr >>> 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the >>> current release from Oracle as I write this) is recommended as a minimum. >>> >>> If a 4.7.3 version is built, this is a fix that we should backport. >>> >>> Thanks, >>> Shawn >>> >>> >> >
Re: get term frequency, just only keywords search
You need to use a shingle filter at index time so that pairs of adjacent words get indexed as single terms, then you can do a term frequency for the shingled pair of terms ("Research Development" as a single term). Be sure to manually apply any other filters, such as lower case or stemming. See: http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilterFactory.html http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html But, note that you don't need to do any of this if you simply want to boost documents containing a phrase - just use the pf, pf2, and pf3 parameters of edsimax or explicitly boost the phrase, such as "research development"^20. -- Jack Krupansky -Original Message- From: ksmith Sent: Saturday, April 26, 2014 5:38 AM To: solr-user@lucene.apache.org Subject: Re: get term frequency, just only keywords search Hi, jack i have a same problem as danielitos85 i want to search like "research development" but termfreq function not work as per your messages and you said that use phraseFreq but we can get it from debug query. my problem is i want to sort on "research development" count, higher count document will display first in list. so how can i sort on that. can you please help me asap. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/get-term-frequency-just-only-keywords-search-tp4084510p4133260.html Sent from the Solr - User mailing list archive at Nabble.com.
Optimal setup for multiple tools
Hello, My team has been working with SOLR for the last 2 years. We have two main indices: 1. documents -index and store main text -one record for each document 2. places (all of the geospatial places found in the documents above) -index but don't store main text -one record for each place. could have thousands in a single document but the ratio has seemed to come out to 6:1 places to documents We have several tools that query the above indices. One is just a standard search tool that returns documents filtered on keyword, temporal, and geospatial filters. Another is a geospatial tool that queries the places collection. We now have a requirement to provide document highlighting when querying in the geospatial tool. Does anyone have any suggestions/prior experience on how they would set up two collections that are essentially different "views" of the data? Also any tips on how to ensure that these two collections are "in sync" (meaning any documents indexed into the documents collection are also properly indexed in places)? Thanks alot, Jimmy Lin
Re: DIH issues with 4.7.1
System.currentTimeMillis can jump around due to NTP, etc. If you are trying to count elapsed time, you don’t want to use a method that can jump around with the results. -- Mark Miller about.me/markrmiller On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) wrote: Hi Rafał Kuć I got it,the point is many operating systems measure time in units of tens of milliseconds,and the System.currentTimeMillis() is just base on operating system. In my case,I just do DIH with a crontable, Is there any possiblity to get in that trouble?I am really can not picture what the situation may lead to the problem. Thanks very much. 2014-04-26 20:49 GMT+08:00 YouPeng Yang : > Hi Mark Miller > Sorry to get you in these discussion . > I notice that Mark Miller report this issure in > https://issues.apache.org/jira/browse/SOLR-5734 according to > https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with > the zookeeper. > If I just do DIH with JDBCDataSource ,I do not think it will get the > problem. > Please give some hints > > >> Bonus,just post the last mail I send about the problem: > > I have just compare the difference between the version 4.6.0 and 4.7.1. > Notice that the time in the getConnection function is declared with the > System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). > Curious about the resson for the change.the benefit of it .Is it > neccessory? > I have read the SOLR-5734 , > https://issues.apache.org/jira/browse/SOLR-5734 > Do some google about the difference of currentTimeMillis and nano,but > still can not figure out it. > > Thank you very much. > > > 2014-04-26 20:31 GMT+08:00 YouPeng Yang : > > Hi >> I have just compare the difference between the version 4.6.0 and >> 4.7.1. Notice that the time in the getConnection function is declared >> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >> Curious about the resson for the change.the benefit of it .Is it >> neccessory? >> I have read the SOLR-5734 , >> https://issues.apache.org/jira/browse/SOLR-5734 >> Do some google about the difference of currentTimeMillis and nano,but >> still can not figure out it. >> >> >> >> >> 2014-04-26 2:24 GMT+08:00 Shawn Heisey : >> >> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: >>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH process that we are using takes 4x as long to complete. The only odd thing I notice is when I enable debug logging for the dataimporthandler process, it appears that in the new version each sql query is resulting in a new connection opened through jdbcdatasource (log: http://pastebin.com/JKh4gpmu). Were there any changes that would affect the speed of running a full import? >>> >>> This is most likely the problem you are experiencing: >>> >>> https://issues.apache.org/jira/browse/SOLR-5954 >>> >>> The fix will be in the new 4.8 version. The release process for 4.8 is >>> underway right now. A second release candidate was required yesterday. If >>> no further problems are encountered, the release should be made around the >>> middle of next week. If problems are encountered, the release will be >>> delayed. >>> >>> Here's something very important that has been mentioned before: Solr >>> 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the >>> current release from Oracle as I write this) is recommended as a minimum. >>> >>> If a 4.7.3 version is built, this is a fix that we should backport. >>> >>> Thanks, >>> Shawn >>> >>> >> >
Re: DIH issues with 4.7.1
NTP should slew the clock rather than jump it. I haven't checked recently, but that is how it worked in the 90's when I was organizing the NTP hierarchy at HP. It only does step changes if the clocks is really wrong. That is most likely at reboot, when other demons aren't running yet. wunder On Apr 26, 2014, at 7:30 AM, Mark Miller wrote: > System.currentTimeMillis can jump around due to NTP, etc. If you are trying > to count elapsed time, you don’t want to use a method that can jump around > with the results. > -- > Mark Miller > about.me/markrmiller > > On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) > wrote: > > Hi Rafał Kuć > I got it,the point is many operating systems measure time in units of > tens of milliseconds,and the System.currentTimeMillis() is just base on > operating system. > In my case,I just do DIH with a crontable, Is there any possiblity to get > in that trouble?I am really can not picture what the situation may lead to > the problem. > > > Thanks very much. > > > 2014-04-26 20:49 GMT+08:00 YouPeng Yang : > >> Hi Mark Miller >> Sorry to get you in these discussion . >> I notice that Mark Miller report this issure in >> https://issues.apache.org/jira/browse/SOLR-5734 according to >> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with >> the zookeeper. >> If I just do DIH with JDBCDataSource ,I do not think it will get the >> problem. >> Please give some hints >> Bonus,just post the last mail I send about the problem: >> >> I have just compare the difference between the version 4.6.0 and 4.7.1. >> Notice that the time in the getConnection function is declared with the >> System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >> Curious about the resson for the change.the benefit of it .Is it >> neccessory? >> I have read the SOLR-5734 , >> https://issues.apache.org/jira/browse/SOLR-5734 >> Do some google about the difference of currentTimeMillis and nano,but >> still can not figure out it. >> >> Thank you very much. >> >> >> 2014-04-26 20:31 GMT+08:00 YouPeng Yang : >> >> Hi >>> I have just compare the difference between the version 4.6.0 and >>> 4.7.1. Notice that the time in the getConnection function is declared >>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >>> Curious about the resson for the change.the benefit of it .Is it >>> neccessory? >>> I have read the SOLR-5734 , >>> https://issues.apache.org/jira/browse/SOLR-5734 >>> Do some google about the difference of currentTimeMillis and nano,but >>> still can not figure out it. >>> >>> >>> >>> >>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey : >>> >>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: > I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH > process that we are using takes 4x as long to complete. The only odd > thing I notice is when I enable debug logging for the dataimporthandler > process, it appears that in the new version each sql query is resulting > in > a new connection opened through jdbcdatasource (log: > http://pastebin.com/JKh4gpmu). Were there any changes that would > affect > the speed of running a full import? > This is most likely the problem you are experiencing: https://issues.apache.org/jira/browse/SOLR-5954 The fix will be in the new 4.8 version. The release process for 4.8 is underway right now. A second release candidate was required yesterday. If no further problems are encountered, the release should be made around the middle of next week. If problems are encountered, the release will be delayed. Here's something very important that has been mentioned before: Solr 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the current release from Oracle as I write this) is recommended as a minimum. If a 4.7.3 version is built, this is a fix that we should backport. Thanks, Shawn >>> >> -- Walter Underwood wun...@wunderwood.org
RE: TB scale
> Anyone with experience, suggestions or lessons learned in the 10 -100 TB > scale they'd like to share? > Researching optimum design for a Solr Cloud with, say, about 20TB index. We're building a web archive with a projected index size of 20TB (distributed in 20 shards). Some test results and a short write-up at http://sbdevel.wordpress.com/2013/12/06/danish-webscale/ - feel free to ask for more details. tl;dr: We're saying to hell with RAM for caching and putting it all on SSDs on a single big machine. Results so far (some distributed tests with 200GB & 400GB indexes, some single tests with a production-index of 1TB) are very promising, both for plain keyword-search, grouping and faceting (DocValues rocks). - Toke Eskildsen
Re: TB scale
I think Hathi Trust has a few terabytes of index. They do full-text search on 10 million books. http://www.hathitrust.org/blogs/Large-scale-Search wunder On Apr 26, 2014, at 8:36 AM, Toke Eskildsen wrote: >> Anyone with experience, suggestions or lessons learned in the 10 -100 TB >> scale they'd like to share? >> Researching optimum design for a Solr Cloud with, say, about 20TB index. > > We're building a web archive with a projected index size of 20TB (distributed > in 20 shards). Some test results and a short write-up at > http://sbdevel.wordpress.com/2013/12/06/danish-webscale/ - feel free to ask > for more details. > > tl;dr: We're saying to hell with RAM for caching and putting it all on SSDs > on a single big machine. Results so far (some distributed tests with 200GB & > 400GB indexes, some single tests with a production-index of 1TB) are very > promising, both for plain keyword-search, grouping and faceting (DocValues > rocks). > > - Toke Eskildsen
Re: DIH issues with 4.7.1
My answer remains the same. I guess if you want more precise terminology, nanoTime will generally be monotonic and currentTimeMillis will not be, due to things like NTP, etc. You want monotonicity for measuring elapsed times. -- Mark Miller about.me/markrmiller On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) wrote: NTP should slew the clock rather than jump it. I haven't checked recently, but that is how it worked in the 90's when I was organizing the NTP hierarchy at HP. It only does step changes if the clocks is really wrong. That is most likely at reboot, when other demons aren't running yet. wunder On Apr 26, 2014, at 7:30 AM, Mark Miller wrote: > System.currentTimeMillis can jump around due to NTP, etc. If you are trying > to count elapsed time, you don’t want to use a method that can jump around > with the results. > -- > Mark Miller > about.me/markrmiller > > On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) > wrote: > > Hi Rafał Kuć > I got it,the point is many operating systems measure time in units of > tens of milliseconds,and the System.currentTimeMillis() is just base on > operating system. > In my case,I just do DIH with a crontable, Is there any possiblity to get > in that trouble?I am really can not picture what the situation may lead to > the problem. > > > Thanks very much. > > > 2014-04-26 20:49 GMT+08:00 YouPeng Yang : > >> Hi Mark Miller >> Sorry to get you in these discussion . >> I notice that Mark Miller report this issure in >> https://issues.apache.org/jira/browse/SOLR-5734 according to >> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with >> the zookeeper. >> If I just do DIH with JDBCDataSource ,I do not think it will get the >> problem. >> Please give some hints >> Bonus,just post the last mail I send about the problem: >> >> I have just compare the difference between the version 4.6.0 and 4.7.1. >> Notice that the time in the getConnection function is declared with the >> System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >> Curious about the resson for the change.the benefit of it .Is it >> neccessory? >> I have read the SOLR-5734 , >> https://issues.apache.org/jira/browse/SOLR-5734 >> Do some google about the difference of currentTimeMillis and nano,but >> still can not figure out it. >> >> Thank you very much. >> >> >> 2014-04-26 20:31 GMT+08:00 YouPeng Yang : >> >> Hi >>> I have just compare the difference between the version 4.6.0 and >>> 4.7.1. Notice that the time in the getConnection function is declared >>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >>> Curious about the resson for the change.the benefit of it .Is it >>> neccessory? >>> I have read the SOLR-5734 , >>> https://issues.apache.org/jira/browse/SOLR-5734 >>> Do some google about the difference of currentTimeMillis and nano,but >>> still can not figure out it. >>> >>> >>> >>> >>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey : >>> >>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: > I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH > process that we are using takes 4x as long to complete. The only odd > thing I notice is when I enable debug logging for the dataimporthandler > process, it appears that in the new version each sql query is resulting > in > a new connection opened through jdbcdatasource (log: > http://pastebin.com/JKh4gpmu). Were there any changes that would > affect > the speed of running a full import? > This is most likely the problem you are experiencing: https://issues.apache.org/jira/browse/SOLR-5954 The fix will be in the new 4.8 version. The release process for 4.8 is underway right now. A second release candidate was required yesterday. If no further problems are encountered, the release should be made around the middle of next week. If problems are encountered, the release will be delayed. Here's something very important that has been mentioned before: Solr 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the current release from Oracle as I write this) is recommended as a minimum. If a 4.7.3 version is built, this is a fix that we should backport. Thanks, Shawn >>> >> -- Walter Underwood wun...@wunderwood.org
Re: SOLR 4 not utilizing multi CPU cores
I suspect your problem is that termfreq is looking at _terms_, not phrases. It has no sense of position, that's a higher-level construct. So "Research Development" is searched as a single _term_, and there are no two-word terms. What use-case are you trying to solve? This seems like an XY problem perhaps.. Best, Erick On Sat, Apr 26, 2014 at 12:49 AM, ksmith wrote: > hi Salman, > > i getting one problem in solr 4.6 > i have upgrade solr 1.4 to solr 4.6 because of i want to display search term > count, > and term count getting by solr term frequency > but when i search only single word than its work fine i get perfect count > but when i search multiple word within double quote it returning 0 count > below is my code: > termfreq(datafield, "Research") its working fine > termfreq(datafield, "Research Development") its return 0 but multiple > document have the same word. > > i have try with different field type : text_gen, text_en_splitting, String > but i didnt get exact result > can you please help for this. > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-4-not-utilizing-multi-CPU-cores-tp4105058p4133256.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Optimal setup for multiple tools
Have you considered putting them in the _same_ index? There's not much penalty at all with having sparsely populated fields in a document, so the fact that the two parts of your index had orthogonal fields wouldn't cost you much and would solve the synchronization problem. You can include a type field to distinguish between the and just include a filter query to keep them separate. Since that'll be cached, your search performance should be fine. Otherwise you should include the fields you need to sort on in the index you need to sort. Denormalizes the data, but... About keeping the two in synch, that's really outside Solr, your indexing process has to manage that I'd guess. Best, Erick On Sat, Apr 26, 2014 at 7:24 AM, Jimmy Lin wrote: > Hello, > > My team has been working with SOLR for the last 2 years. We have two main > indices: > > 1. documents > -index and store main text > -one record for each document > 2. places (all of the geospatial places found in the documents above) > -index but don't store main text > -one record for each place. could have thousands in a single > document but the ratio has seemed to come out to 6:1 places to documents > > We have several tools that query the above indices. One is just a standard > search tool that returns documents filtered on keyword, temporal, and > geospatial filters. Another is a geospatial tool that queries the places > collection. We now have a requirement to provide document highlighting > when querying in the geospatial tool. > > Does anyone have any suggestions/prior experience on how they would set up > two collections that are essentially different "views" of the data? Also > any tips on how to ensure that these two collections are "in sync" (meaning > any documents indexed into the documents collection are also properly > indexed in places)? > > Thanks alot, > > Jimmy Lin
zkCli zkhost parameter
It looks like this only takes a single host as its value, whereas the zkHost environment variable for Solr takes a comma-separated list. Shouldn't the client also take a comma-separated list? k/r, Scott
Re: zkCli zkhost parameter
Have you tried a comma-separated list or are you going by documentation? It should work. -- Mark Miller about.me/markrmiller On April 26, 2014 at 1:03:25 PM, Scott Stults (sstu...@opensourceconnections.com) wrote: It looks like this only takes a single host as its value, whereas the zkHost environment variable for Solr takes a comma-separated list. Shouldn't the client also take a comma-separated list? k/r, Scott
Re: DIH issues with 4.7.1
NTP works very hard to keep the clock positive monotonic. But nanoTime is intended for elapsed time measurement anyway, so it is the right choice. You can get some pretty fun clock behavior by running on virtual machines, like in AWS. And some system real time clocks don't tick during a leap second. And Windows system clocks are probably still hopeless. If you want to run the clock backwards, we don't need NTP, we can set it with "date". wunder On Apr 26, 2014, at 9:10 AM, Mark Miller wrote: > My answer remains the same. I guess if you want more precise terminology, > nanoTime will generally be monotonic and currentTimeMillis will not be, due > to things like NTP, etc. You want monotonicity for measuring elapsed times. > -- > Mark Miller > about.me/markrmiller > > On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) > wrote: > > NTP should slew the clock rather than jump it. I haven't checked recently, > but that is how it worked in the 90's when I was organizing the NTP hierarchy > at HP. > > It only does step changes if the clocks is really wrong. That is most likely > at reboot, when other demons aren't running yet. > > wunder > > On Apr 26, 2014, at 7:30 AM, Mark Miller wrote: > >> System.currentTimeMillis can jump around due to NTP, etc. If you are trying >> to count elapsed time, you don’t want to use a method that can jump around >> with the results. >> -- >> Mark Miller >> about.me/markrmiller >> >> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) >> wrote: >> >> Hi Rafał Kuć >> I got it,the point is many operating systems measure time in units of >> tens of milliseconds,and the System.currentTimeMillis() is just base on >> operating system. >> In my case,I just do DIH with a crontable, Is there any possiblity to get >> in that trouble?I am really can not picture what the situation may lead to >> the problem. >> >> >> Thanks very much. >> >> >> 2014-04-26 20:49 GMT+08:00 YouPeng Yang : >> >>> Hi Mark Miller >>> Sorry to get you in these discussion . >>> I notice that Mark Miller report this issure in >>> https://issues.apache.org/jira/browse/SOLR-5734 according to >>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with >>> the zookeeper. >>> If I just do DIH with JDBCDataSource ,I do not think it will get the >>> problem. >>> Please give some hints >>> > Bonus,just post the last mail I send about the problem: >>> >>> I have just compare the difference between the version 4.6.0 and 4.7.1. >>> Notice that the time in the getConnection function is declared with the >>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >>> Curious about the resson for the change.the benefit of it .Is it >>> neccessory? >>> I have read the SOLR-5734 , >>> https://issues.apache.org/jira/browse/SOLR-5734 >>> Do some google about the difference of currentTimeMillis and nano,but >>> still can not figure out it. >>> >>> Thank you very much. >>> >>> >>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang : >>> >>> Hi I have just compare the difference between the version 4.6.0 and 4.7.1. Notice that the time in the getConnection function is declared with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). Curious about the resson for the change.the benefit of it .Is it neccessory? I have read the SOLR-5734 , https://issues.apache.org/jira/browse/SOLR-5734 Do some google about the difference of currentTimeMillis and nano,but still can not figure out it. 2014-04-26 2:24 GMT+08:00 Shawn Heisey : On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: > >> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH >> process that we are using takes 4x as long to complete. The only odd >> thing I notice is when I enable debug logging for the dataimporthandler >> process, it appears that in the new version each sql query is resulting >> in >> a new connection opened through jdbcdatasource (log: >> http://pastebin.com/JKh4gpmu). Were there any changes that would >> affect >> the speed of running a full import? >> > > This is most likely the problem you are experiencing: > > https://issues.apache.org/jira/browse/SOLR-5954 > > The fix will be in the new 4.8 version. The release process for 4.8 is > underway right now. A second release candidate was required yesterday. If > > no further problems are encountered, the release should be made around > the > middle of next week. If problems are encountered, the release will be > delayed. > > Here's something very important that has been mentioned before: Solr > 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the >
Re: DIH issues with 4.7.1
bq. due to things like NTP, etc. The full sentence is very important. NTP is not the only way for this to happen - you also have leap seconds, daylight savings time, internet clock sync, a whole host of things that affect currentTimeMillis and not nanoTime. It is without question the way to go to even hope for monotonicity. -- Mark Miller about.me/markrmiller On April 26, 2014 at 1:11:14 PM, Walter Underwood (wun...@wunderwood.org) wrote: NTP works very hard to keep the clock positive monotonic. But nanoTime is intended for elapsed time measurement anyway, so it is the right choice. You can get some pretty fun clock behavior by running on virtual machines, like in AWS. And some system real time clocks don't tick during a leap second. And Windows system clocks are probably still hopeless. If you want to run the clock backwards, we don't need NTP, we can set it with "date". wunder On Apr 26, 2014, at 9:10 AM, Mark Miller wrote: > My answer remains the same. I guess if you want more precise terminology, > nanoTime will generally be monotonic and currentTimeMillis will not be, due > to things like NTP, etc. You want monotonicity for measuring elapsed times. > -- > Mark Miller > about.me/markrmiller > > On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) > wrote: > > NTP should slew the clock rather than jump it. I haven't checked recently, > but that is how it worked in the 90's when I was organizing the NTP hierarchy > at HP. > > It only does step changes if the clocks is really wrong. That is most likely > at reboot, when other demons aren't running yet. > > wunder > > On Apr 26, 2014, at 7:30 AM, Mark Miller wrote: > >> System.currentTimeMillis can jump around due to NTP, etc. If you are trying >> to count elapsed time, you don’t want to use a method that can jump around >> with the results. >> -- >> Mark Miller >> about.me/markrmiller >> >> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) >> wrote: >> >> Hi Rafał Kuć >> I got it,the point is many operating systems measure time in units of >> tens of milliseconds,and the System.currentTimeMillis() is just base on >> operating system. >> In my case,I just do DIH with a crontable, Is there any possiblity to get >> in that trouble?I am really can not picture what the situation may lead to >> the problem. >> >> >> Thanks very much. >> >> >> 2014-04-26 20:49 GMT+08:00 YouPeng Yang : >> >>> Hi Mark Miller >>> Sorry to get you in these discussion . >>> I notice that Mark Miller report this issure in >>> https://issues.apache.org/jira/browse/SOLR-5734 according to >>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with >>> the zookeeper. >>> If I just do DIH with JDBCDataSource ,I do not think it will get the >>> problem. >>> Please give some hints >>> > Bonus,just post the last mail I send about the problem: >>> >>> I have just compare the difference between the version 4.6.0 and 4.7.1. >>> Notice that the time in the getConnection function is declared with the >>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >>> Curious about the resson for the change.the benefit of it .Is it >>> neccessory? >>> I have read the SOLR-5734 , >>> https://issues.apache.org/jira/browse/SOLR-5734 >>> Do some google about the difference of currentTimeMillis and nano,but >>> still can not figure out it. >>> >>> Thank you very much. >>> >>> >>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang : >>> >>> Hi I have just compare the difference between the version 4.6.0 and 4.7.1. Notice that the time in the getConnection function is declared with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). Curious about the resson for the change.the benefit of it .Is it neccessory? I have read the SOLR-5734 , https://issues.apache.org/jira/browse/SOLR-5734 Do some google about the difference of currentTimeMillis and nano,but still can not figure out it. 2014-04-26 2:24 GMT+08:00 Shawn Heisey : On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: > >> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH >> process that we are using takes 4x as long to complete. The only odd >> thing I notice is when I enable debug logging for the dataimporthandler >> process, it appears that in the new version each sql query is resulting >> in >> a new connection opened through jdbcdatasource (log: >> http://pastebin.com/JKh4gpmu). Were there any changes that would >> affect >> the speed of running a full import? >> > > This is most likely the problem you are experiencing: > > https://issues.apache.org/jira/browse/SOLR-5954 > > The fix will be in t
Re: Search for a mask that matches the requested string
Hi, I'm the author of luwak. I have a half-finished version sitting in a branch somewhere that pulls all the intervals-fork-specific code out of the library and would run with 4.6. It would need to be integrated into Solr as well, but I have an upcoming project which may well do just that. Feel free to ping me directly! Alan Woodward www.flax.co.uk On 26 Apr 2014, at 03:29, Otis Gospodnetic wrote: > Luwak is not based on the fork of Lucene or rather, the fork you are seeing > is there only because the Luwak authors needed highlighting. If you don't > need highlighting you can probably modify Luwak a bit to use regular > Lucene. The Lucene fork you are seeing there will also, eventually, be > committed to Lucene trunk and then hopefully backported to 4.x. > > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > On Fri, Apr 25, 2014 at 6:46 PM, Muhammad Gelbana wrote: > >> Luwak is based on a fork of solr\lucene which I cannot use. I have to do >> this using solr 4.6, whether by writing extra code or not. Thanks. >> >> *-* >> *Muhammad Gelbana* >> http://www.linkedin.com/in/mgelbana >> >> >> On Sat, Apr 26, 2014 at 12:13 AM, Ahmet Arslan wrote: >> >>> Hi, >>> >>> You don't need to write code for this. Use luwak (I gave the link in my >>> first e-mail) instead. >>> >>> If your can't get luwak running because its too complicated etc, see a >>> similar discussion >>> >>> http://find.searchhub.org/document/9411388c7d2de701#36e50082e918b10c >>> >>> where diy-percolator example pointer is given. It is an example to use >>> memory index. >>> >>> Ahmet >>> >>> >>> >>> On Saturday, April 26, 2014 1:05 AM, Muhammad Gelbana < >> m.gelb...@gmail.com> >>> wrote: >>> @Jack, I am ready to write custom code to implement such feature but I >>> don't know what feature in solr should I extend ? Where should I start ? >> I >>> believe it should be a very simple task. >>> >>> @Ahmet, how can I use the class you mentioned ? Is there a tutorial for >> it >>> ? I'm not sure how the code in the class's description should work, I've >>> never extended solr before. >>> >>> Thank you all. >>> >>> *-* >>> *Muhammad Gelbana* >>> http://www.linkedin.com/in/mgelbana >>> >>> >>> >>> On Fri, Apr 25, 2014 at 10:38 PM, Ahmet Arslan >> wrote: >>> Hi, Your use case is different than ad hoc retrieval. Where you have set of documents and varying queries. In your case it is the reverse, you have a query (string masks) stored A?, and incoming documents are percolated against it. out of the box Solr does not have support for this today. Please see : >>> >> http://lucene.apache.org/core/4_7_2/memory/org/apache/lucene/index/memory/MemoryIndex.html By the way wildcard ? matches a single character. Ahmet On Friday, April 25, 2014 11:02 PM, Muhammad Gelbana < >>> m.gelb...@gmail.com> wrote: I have no idea how can this help me. I have been using solr for a few >>> weeks and I'm not familiar with it yet. I'm asking for a very simple task, a >>> way to customize how solr matches a string, does this exist in solr ? *-* *Muhammad Gelbana* http://www.linkedin.com/in/mgelbana On Thu, Apr 24, 2014 at 10:09 PM, Ahmet Arslan >>> wrote: > Hi, > > Please see : https://github.com/flaxsearch/luwak > > Ahmet > > > On Thursday, April 24, 2014 8:40 PM, Muhammad Gelbana < m.gelb...@gmail.com> > wrote: > (Please make sure you reply to my address because I didn't subscribe >> to > this mailing list) > > I'm using Solr 4.6 > > I need to store string masks in Solr. By masks, I mean strings that >> can > match other strings. > > Then I need to search for masks that match the string I'm providing >> in >>> my > query. For example, assume the following single-field document stored >>> in > Solr: > > { >"fieldA": "__A__" > } > > I need to be able to find this document if I query the fieldA field >>> with a > string like *12A34*, as the underscore "*_*" matches a single string. >>> The > single string matching mechanism is my strict goal here, multiple >>> string > matching won't be helpful. > > I hope I was clear enough. Please elaborate because I'm not versatile with > solr and I haven't been using it for too long. > Thank you. > > *-* > *Muhammad Gelbana* > http://www.linkedin.com/in/mgelbana > > >>> >>> >>
Re: Indexing Big Data With or Without Solr
Thanks vineet With Regards Aman Tandon On Wed, Apr 23, 2014 at 7:21 PM, Vineet Mishra wrote: > I did it with Tomcat and Zookeeper Ensemble, will mail you the steps > shortly. > > Cheers > > > On Sat, Apr 19, 2014 at 9:09 AM, Aman Tandon >wrote: > > > Vineet please share after you setup for solr cloud > > Are you using jetty or tomcat.? > > > > On Saturday, April 19, 2014, Vineet Mishra > wrote: > > > Thanks Furkan, I will definitely give it a try then. > > > > > > Thanks again! > > > > > > > > > > > > > > > On Tue, Apr 15, 2014 at 7:53 PM, Furkan KAMACI > >wrote: > > > > > >> Hi Vineet; > > >> > > >> I've been using SolrCloud for such kind of Big Data and I think that > you > > >> should consider to use it. If you have any problems you can ask it > here. > > >> > > >> Thanks; > > >> Furkan KAMACI > > >> > > >> > > >> 2014-04-15 13:20 GMT+03:00 Vineet Mishra : > > >> > > >> > Hi All, > > >> > > > >> > I have worked with Solr 3.5 to implement real time search on some > > 100GB > > >> > data, that worked fine but was little slow on complex > queries(Multiple > > >> > group/joined queries). > > >> > But now I want to index some real Big Data(around 4 TB or even > more), > > can > > >> > SolrCloud be solution for it if not what could be the best possible > > >> > solution in this case. > > >> > > > >> > *Stats for the previous Implementation:* > > >> > It was Master Slave Architecture with normal Standalone multiple > > instance > > >> > of Solr 3.5. There were around 12 Solr instance running on different > > >> > machines. > > >> > > > >> > *Things to consider for the next implementation:* > > >> > Since all the data is sensor data hence it is the factor of > duplicity > > and > > >> > uniqueness. > > >> > > > >> > *Really urgent, please take the call on priority with set of > feasible > > >> > solution.* > > >> > > > >> > Regards > > >> > > > >> > > > > > > > -- > > Sent from Gmail Mobile > > >
[ANN] Apache Gora 0.4 Released
Good Afternoon Everyone, > > The Apache Gora team are very proud to announce the immediate release of > Gora 0.4 which is a major release for the project. > > The Apache Gora open source framework provides an in-memory data model and > persistence for big data. Gora supports persisting to column stores, key > value stores, document stores and RDBMSs, and analyzing the data with > extensive Apache Hadoop™ MapReduce support. Gora uses the Apache Software > License v2.0. > > This release addresses no fewer than 60 issues. Major improvements within > the release scope comprise a complete upgrade to Apache Avro 1.7.X and > overhaul of the Gora persistency API (such improvements enable Gora to be > used to map much more expressive and complicated data structures than > previously available), upgrades to Apache HBase 0.94.13, Apache Cassandra > 2.0.X and Apache Accumulo 1.5.X. > Users can also benefit from using Gora + Solr for object-to-datastore > mapping with the addition of the new Solr module which uses Solr 4.X. > > Gora sources tar.gz and .zip release artifacts (along with signatures) can > be obtained from visiting our DOWNLOADS [0] page. We also encourage users > to upgrade their build dependencies using our published Maven Artifacts [1]. > > Please redistribute this email to your project mailing list. > > Thanks > Lewis > (on behalf of the Apache Gora team) > > [0] http://gora.apache.org/downloads.html > [1] http://search.maven.org/#search|ga|1|gora > >
RE: How can I convert xml message for updating a Solr index to a javabin file
Does anyone know a way to do this? Thanks. -Original Message- From: Elran Dvir Sent: Thursday, April 24, 2014 4:11 PM To: solr-user@lucene.apache.org Subject: RE: How can I convert xml message for updating a Solr index to a javabin file I want to measure xml vs javabin update message indexing performance. -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Thursday, April 24, 2014 2:04 PM To: solr-user@lucene.apache.org Subject: Re: How can I convert xml message for updating a Solr index to a javabin file Why would you want to do this? Javabin is used by SolrJ to communicate with Solr. XML is good enough for communicating from the command line/curl, as is JSON. Attempting to use javabin just seems to add an unnecessary complication. Upayavifra On Thu, Apr 24, 2014, at 10:20 AM, Elran Dvir wrote: > Hi all, > Is there a way I can covert a xml Solr update message file to javabin > file? If so, How? > How can I use curl to update Solr by javabin message file? > > Thank you very much. Email secured by Check Point