RE: How can I convert xml message for updating a Solr index to a javabin file

2014-04-26 Thread Elran Dvir
Does anyone know a way to do this? Thanks. -Original Message- From: Elran Dvir Sent: Thursday, April 24, 2014 4:11 PM To: solr-user@lucene.apache.org Subject: RE: How can I convert xml message for updating a Solr index to a javabin file I want to measure xml vs javabin update message i

[ANN] Apache Gora 0.4 Released

2014-04-26 Thread Lewis John Mcgibbney
Good Afternoon Everyone, > > The Apache Gora team are very proud to announce the immediate release of > Gora 0.4 which is a major release for the project. > > The Apache Gora open source framework provides an in-memory data model and > persistence for big data. Gora supports persisting to column st

Re: Indexing Big Data With or Without Solr

2014-04-26 Thread Aman Tandon
Thanks vineet With Regards Aman Tandon On Wed, Apr 23, 2014 at 7:21 PM, Vineet Mishra wrote: > I did it with Tomcat and Zookeeper Ensemble, will mail you the steps > shortly. > > Cheers > > > On Sat, Apr 19, 2014 at 9:09 AM, Aman Tandon >wrote: > > > Vineet please share after you setup for sol

Re: Search for a mask that matches the requested string

2014-04-26 Thread Alan Woodward
Hi, I'm the author of luwak. I have a half-finished version sitting in a branch somewhere that pulls all the intervals-fork-specific code out of the library and would run with 4.6. It would need to be integrated into Solr as well, but I have an upcoming project which may well do just that. Fe

Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller
bq. due to things like NTP, etc. The full sentence is very important. NTP is not the only way for this to happen - you also have leap seconds, daylight savings time, internet clock sync, a whole host of things that affect currentTimeMillis and not nanoTime. It is without question the way to go

Re: DIH issues with 4.7.1

2014-04-26 Thread Walter Underwood
NTP works very hard to keep the clock positive monotonic. But nanoTime is intended for elapsed time measurement anyway, so it is the right choice. You can get some pretty fun clock behavior by running on virtual machines, like in AWS. And some system real time clocks don't tick during a leap sec

Re: zkCli zkhost parameter

2014-04-26 Thread Mark Miller
Have you tried a comma-separated list or are you going by documentation? It should work.  --  Mark Miller about.me/markrmiller On April 26, 2014 at 1:03:25 PM, Scott Stults (sstu...@opensourceconnections.com) wrote: It looks like this only takes a single host as its value, whereas the zkHost

zkCli zkhost parameter

2014-04-26 Thread Scott Stults
It looks like this only takes a single host as its value, whereas the zkHost environment variable for Solr takes a comma-separated list. Shouldn't the client also take a comma-separated list? k/r, Scott

Re: Optimal setup for multiple tools

2014-04-26 Thread Erick Erickson
Have you considered putting them in the _same_ index? There's not much penalty at all with having sparsely populated fields in a document, so the fact that the two parts of your index had orthogonal fields wouldn't cost you much and would solve the synchronization problem. You can include a type f

Re: SOLR 4 not utilizing multi CPU cores

2014-04-26 Thread Erick Erickson
I suspect your problem is that termfreq is looking at _terms_, not phrases. It has no sense of position, that's a higher-level construct. So "Research Development" is searched as a single _term_, and there are no two-word terms. What use-case are you trying to solve? This seems like an XY problem

Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller
My answer remains the same. I guess if you want more precise terminology, nanoTime will generally be monotonic and currentTimeMillis will not be, due to things like NTP, etc. You want monotonicity for measuring elapsed times. --  Mark Miller about.me/markrmiller On April 26, 2014 at 11:25:16 AM,

Re: TB scale

2014-04-26 Thread Walter Underwood
I think Hathi Trust has a few terabytes of index. They do full-text search on 10 million books. http://www.hathitrust.org/blogs/Large-scale-Search wunder On Apr 26, 2014, at 8:36 AM, Toke Eskildsen wrote: >> Anyone with experience, suggestions or lessons learned in the 10 -100 TB >> scale th

RE: TB scale

2014-04-26 Thread Toke Eskildsen
> Anyone with experience, suggestions or lessons learned in the 10 -100 TB > scale they'd like to share? > Researching optimum design for a Solr Cloud with, say, about 20TB index. We're building a web archive with a projected index size of 20TB (distributed in 20 shards). Some test results and a

Re: DIH issues with 4.7.1

2014-04-26 Thread Walter Underwood
NTP should slew the clock rather than jump it. I haven't checked recently, but that is how it worked in the 90's when I was organizing the NTP hierarchy at HP. It only does step changes if the clocks is really wrong. That is most likely at reboot, when other demons aren't running yet. wunder O

Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller
System.currentTimeMillis can jump around due to NTP, etc. If you are trying to count elapsed time, you don’t want to use a method that can jump around with the results. --  Mark Miller about.me/markrmiller On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) wrote: Hi Rafał

Optimal setup for multiple tools

2014-04-26 Thread Jimmy Lin
Hello, My team has been working with SOLR for the last 2 years. We have two main indices: 1. documents -index and store main text -one record for each document 2. places (all of the geospatial places found in the documents above) -index but don't store main text -

Re: get term frequency, just only keywords search

2014-04-26 Thread Jack Krupansky
You need to use a shingle filter at index time so that pairs of adjacent words get indexed as single terms, then you can do a term frequency for the shingled pair of terms ("Research Development" as a single term). Be sure to manually apply any other filters, such as lower case or stemming. Se

Re: DIH issues with 4.7.1

2014-04-26 Thread YouPeng Yang
Hi Rafał Kuć I got it,the point is many operating systems measure time in units of tens of milliseconds,and the System.currentTimeMillis() is just base on operating system. In my case,I just do DIH with a crontable, Is there any possiblity to get in that trouble?I am really can not picture w

Re: DIH issues with 4.7.1

2014-04-26 Thread YouPeng Yang
Hi Mark Miller Sorry to get you in these discussion . I notice that Mark Miller report this issure in https://issues.apache.org/jira/browse/SOLR-5734 according to https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with the zookeeper. If I just do DIH with JDBCDataSource ,I d

Re: DIH issues with 4.7.1

2014-04-26 Thread Rafał Kuć
Hello! Look at the javadocs for both. The granularity of System.currentTimeMillis() depend on the operating system, so it may happen that calls to that method that are 1 millisecond away from each other still return the same value. This is not the case with System.nanoTime() - http://docs.oracle.c

Re: DIH issues with 4.7.1

2014-04-26 Thread YouPeng Yang
Hi I have just compare the difference between the version 4.6.0 and 4.7.1. Notice that the time in the getConnection function is declared with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). Curious about the resson for the change.the benefit of it .Is it neccessory? I hav

'0' Status: Communication Error

2014-04-26 Thread Naresh
I've got this problem that I can't solve. Partly because I can't explain it with the right terms. I'm new to this so sorry for this clumsy question. Below you can see an overview of my goal. I'm using Magento CE1.7.0.2 & Solr 4.6.0. I'm using Magentix/Solr extension in Magento CE1.7.0.2 its work

How to sort solr results by foreign id field

2014-04-26 Thread hungctk33
I have documents with the following fields: id name parent color The parent field is an ID of another document. I want to select all documents where the color is red and sort the results by the name of the parent. Can it be done in solr? - I am a student IT -- View this message in context:

Re: get term frequency, just only keywords search

2014-04-26 Thread ksmith
Hi, jack i have a same problem as danielitos85 i want to search like "research development" but termfreq function not work as per your messages and you said that use phraseFreq but we can get it from debug query. my problem is i want to sort on "research development" count, higher count document wi

Re: SOLR 4 not utilizing multi CPU cores

2014-04-26 Thread ksmith
hi Salman, i getting one problem in solr 4.6 i have upgrade solr 1.4 to solr 4.6 because of i want to display search term count, and term count getting by solr term frequency but when i search only single word than its work fine i get perfect count but when i search multiple word within double quo