Re: SOLR 4 not utilizing multi CPU cores

2014-04-26 Thread ksmith
hi Salman,

i getting one problem in solr 4.6
i have upgrade solr 1.4 to solr 4.6 because of i want to display search term
count,
and term count getting by solr term frequency
but when i search only single word than its work fine i get perfect count
but when i search multiple word within double quote it returning 0 count 
below is my code:
termfreq(datafield, "Research")  its working fine
termfreq(datafield, "Research Development") its return 0 but multiple
document have the same word.

i have try with different field type : text_gen, text_en_splitting, String
but i didnt get exact result
can you please help for this.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-not-utilizing-multi-CPU-cores-tp4105058p4133256.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: get term frequency, just only keywords search

2014-04-26 Thread ksmith
Hi, jack
i have a same problem as danielitos85
i want to search like "research development" but termfreq function not work
as per your messages
and you said that use phraseFreq but we can get it from debug query.
my problem is i want to sort on "research development" count, higher count
document will display first in list.
so how can i sort on that.
can you please help me asap.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/get-term-frequency-just-only-keywords-search-tp4084510p4133260.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to sort solr results by foreign id field

2014-04-26 Thread hungctk33
I have documents with the following fields:

id
name
parent
color
The parent field is an ID of another document.
I want to select all documents where the color is red and sort the results
by the name of the parent.
Can it be done in solr?



-
I am a student IT
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-sort-solr-results-by-foreign-id-field-tp4133263.html
Sent from the Solr - User mailing list archive at Nabble.com.


'0' Status: Communication Error

2014-04-26 Thread Naresh
I've got this problem that I can't solve. Partly because I can't explain it
with the right terms. I'm new to this so sorry for this clumsy question.

Below you can see an overview of my goal.

I'm using Magento CE1.7.0.2 & Solr 4.6.0.

I'm using Magentix/Solr extension in Magento CE1.7.0.2 its working fine i
can able get the response in max of 2secs. (Here i place Solr Server in near
to My Magento)

But i placed my Solr in separate server i don't want to place all these
thing in one server.

 Enable Search  : Yes
 Enable Index   : Yes
 Host   : IP address of Solr file existing server
 Port   : 8983
 Path   : /solr
 Search limit   : 100

But in solr logs its not giving any log details but actually that should
give some log details & time taken for re-indexing dataetc

And in Solr.log file its giving ERR (3): '0' Status: Communication Error..

Any thing wrong i did here ?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/0-Status-Communication-Error-tp4133265.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH issues with 4.7.1

2014-04-26 Thread YouPeng Yang
Hi
   I have just compare the difference between the version 4.6.0 and 4.7.1.
Notice that the time in the getConnection function   is declared with the
System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
  Curious about the resson for the change.the benefit of it .Is it
neccessory?
   I have read the SOLR-5734 ,
https://issues.apache.org/jira/browse/SOLR-5734
   Do some google about the difference of currentTimeMillis and nano,but
still can not figure out it.




2014-04-26 2:24 GMT+08:00 Shawn Heisey :

> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:
>
>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH
>> process that we are using takes 4x as long to complete.  The only odd
>> thing I notice is when I enable debug logging for the dataimporthandler
>> process, it appears that in the new version each sql query is resulting in
>> a new connection opened through jdbcdatasource (log:
>> http://pastebin.com/JKh4gpmu).  Were there any changes that would affect
>> the speed of running a full import?
>>
>
> This is most likely the problem you are experiencing:
>
> https://issues.apache.org/jira/browse/SOLR-5954
>
> The fix will be in the new 4.8 version.  The release process for 4.8 is
> underway right now.  A second release candidate was required yesterday.  If
> no further problems are encountered, the release should be made around the
> middle of next week.  If problems are encountered, the release will be
> delayed.
>
> Here's something very important that has been mentioned before:  Solr 4.8
> will require Java 7.  Previously, Java 6 was required.  Java 7u55 (the
> current release from Oracle as I write this) is recommended as a minimum.
>
> If a 4.7.3 version is built, this is a fix that we should backport.
>
> Thanks,
> Shawn
>
>


Re: DIH issues with 4.7.1

2014-04-26 Thread Rafał Kuć
Hello!

Look at the javadocs for both. The granularity of
System.currentTimeMillis() depend on the operating system, so it may
happen that calls to that method that are 1 millisecond away from each
other still return the same value. This is not the case with
System.nanoTime() -
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


> Hi
>I have just compare the difference between the version 4.6.0 and 4.7.1.
> Notice that the time in the getConnection function   is declared with the
> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
>   Curious about the resson for the change.the benefit of it .Is it
> neccessory?
>I have read the SOLR-5734 ,
> https://issues.apache.org/jira/browse/SOLR-5734
>Do some google about the difference of currentTimeMillis and nano,but
> still can not figure out it.




> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :

>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:
>>
>>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH
>>> process that we are using takes 4x as long to complete.  The only odd
>>> thing I notice is when I enable debug logging for the dataimporthandler
>>> process, it appears that in the new version each sql query is resulting in
>>> a new connection opened through jdbcdatasource (log:
>>> http://pastebin.com/JKh4gpmu).  Were there any changes that would affect
>>> the speed of running a full import?
>>>
>>
>> This is most likely the problem you are experiencing:
>>
>> https://issues.apache.org/jira/browse/SOLR-5954
>>
>> The fix will be in the new 4.8 version.  The release process for 4.8 is
>> underway right now.  A second release candidate was required yesterday.  If
>> no further problems are encountered, the release should be made around the
>> middle of next week.  If problems are encountered, the release will be
>> delayed.
>>
>> Here's something very important that has been mentioned before:  Solr 4.8
>> will require Java 7.  Previously, Java 6 was required.  Java 7u55 (the
>> current release from Oracle as I write this) is recommended as a minimum.
>>
>> If a 4.7.3 version is built, this is a fix that we should backport.
>>
>> Thanks,
>> Shawn
>>
>>



Re: DIH issues with 4.7.1

2014-04-26 Thread YouPeng Yang
Hi Mark Miller
  Sorry to get you in these discussion .
  I notice that Mark Miller report this issure in
https://issues.apache.org/jira/browse/SOLR-5734 according to
https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with
the zookeeper.
  If I just do DIH with JDBCDataSource ,I do not think it will get the
problem.
  Please give some hints

 >> Bonus,just post the last mail I send about the problem:
   I have just compare the difference between the version 4.6.0 and 4.7.1.
Notice that the time in the getConnection function   is declared with the
System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
  Curious about the resson for the change.the benefit of it .Is it
neccessory?
   I have read the SOLR-5734 ,
https://issues.apache.org/jira/browse/SOLR-5734
   Do some google about the difference of currentTimeMillis and nano,but
still can not figure out it.

Thank you very much.


2014-04-26 20:31 GMT+08:00 YouPeng Yang :

> Hi
>I have just compare the difference between the version 4.6.0 and 4.7.1.
> Notice that the time in the getConnection function   is declared with the
> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
>   Curious about the resson for the change.the benefit of it .Is it
> neccessory?
>I have read the SOLR-5734 ,
> https://issues.apache.org/jira/browse/SOLR-5734
>Do some google about the difference of currentTimeMillis and nano,but
> still can not figure out it.
>
>
>
>
> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :
>
> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:
>>
>>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH
>>> process that we are using takes 4x as long to complete.  The only odd
>>> thing I notice is when I enable debug logging for the dataimporthandler
>>> process, it appears that in the new version each sql query is resulting
>>> in
>>> a new connection opened through jdbcdatasource (log:
>>> http://pastebin.com/JKh4gpmu).  Were there any changes that would affect
>>> the speed of running a full import?
>>>
>>
>> This is most likely the problem you are experiencing:
>>
>> https://issues.apache.org/jira/browse/SOLR-5954
>>
>> The fix will be in the new 4.8 version.  The release process for 4.8 is
>> underway right now.  A second release candidate was required yesterday.  If
>> no further problems are encountered, the release should be made around the
>> middle of next week.  If problems are encountered, the release will be
>> delayed.
>>
>> Here's something very important that has been mentioned before:  Solr 4.8
>> will require Java 7.  Previously, Java 6 was required.  Java 7u55 (the
>> current release from Oracle as I write this) is recommended as a minimum.
>>
>> If a 4.7.3 version is built, this is a fix that we should backport.
>>
>> Thanks,
>> Shawn
>>
>>
>


Re: DIH issues with 4.7.1

2014-04-26 Thread YouPeng Yang
Hi Rafał Kuć
  I got it,the point is  many operating systems measure time in units of
tens of milliseconds,and the  System.currentTimeMillis() is  just base on
operating system.
  In my case,I just do DIH with a crontable, Is there any possiblity to get
in that trouble?I am really can not picture what the situation may lead to
the problem.


Thanks very much.


2014-04-26 20:49 GMT+08:00 YouPeng Yang :

> Hi Mark Miller
>   Sorry to get you in these discussion .
>   I notice that Mark Miller report this issure in
> https://issues.apache.org/jira/browse/SOLR-5734 according to
> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with
> the zookeeper.
>   If I just do DIH with JDBCDataSource ,I do not think it will get the
> problem.
>   Please give some hints
>
>  >> Bonus,just post the last mail I send about the problem:
>
>I have just compare the difference between the version 4.6.0 and 4.7.1.
> Notice that the time in the getConnection function   is declared with the
> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
>   Curious about the resson for the change.the benefit of it .Is it
> neccessory?
>I have read the SOLR-5734 ,
> https://issues.apache.org/jira/browse/SOLR-5734
>Do some google about the difference of currentTimeMillis and nano,but
> still can not figure out it.
>
> Thank you very much.
>
>
> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :
>
> Hi
>>I have just compare the difference between the version 4.6.0 and
>> 4.7.1. Notice that the time in the getConnection function   is declared
>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
>>   Curious about the resson for the change.the benefit of it .Is it
>> neccessory?
>>I have read the SOLR-5734 ,
>> https://issues.apache.org/jira/browse/SOLR-5734
>>Do some google about the difference of currentTimeMillis and nano,but
>> still can not figure out it.
>>
>>
>>
>>
>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :
>>
>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:
>>>
 I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH
 process that we are using takes 4x as long to complete.  The only odd
 thing I notice is when I enable debug logging for the dataimporthandler
 process, it appears that in the new version each sql query is resulting
 in
 a new connection opened through jdbcdatasource (log:
 http://pastebin.com/JKh4gpmu).  Were there any changes that would
 affect
 the speed of running a full import?

>>>
>>> This is most likely the problem you are experiencing:
>>>
>>> https://issues.apache.org/jira/browse/SOLR-5954
>>>
>>> The fix will be in the new 4.8 version.  The release process for 4.8 is
>>> underway right now.  A second release candidate was required yesterday.  If
>>> no further problems are encountered, the release should be made around the
>>> middle of next week.  If problems are encountered, the release will be
>>> delayed.
>>>
>>> Here's something very important that has been mentioned before:  Solr
>>> 4.8 will require Java 7.  Previously, Java 6 was required.  Java 7u55 (the
>>> current release from Oracle as I write this) is recommended as a minimum.
>>>
>>> If a 4.7.3 version is built, this is a fix that we should backport.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>


Re: get term frequency, just only keywords search

2014-04-26 Thread Jack Krupansky
You need to use a shingle filter at index time so that pairs of adjacent 
words get indexed as single terms, then you can do a term frequency for the 
shingled pair of terms ("Research Development" as a single term). Be sure to 
manually apply any other filters, such as lower case or stemming.


See:
http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilterFactory.html
http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html

But, note that you don't need to do any of this if you simply want to boost 
documents containing a phrase - just use the pf, pf2, and pf3 parameters of 
edsimax or explicitly boost the phrase, such as "research development"^20.


-- Jack Krupansky

-Original Message- 
From: ksmith

Sent: Saturday, April 26, 2014 5:38 AM
To: solr-user@lucene.apache.org
Subject: Re: get term frequency, just only keywords search

Hi, jack
i have a same problem as danielitos85
i want to search like "research development" but termfreq function not work
as per your messages
and you said that use phraseFreq but we can get it from debug query.
my problem is i want to sort on "research development" count, higher count
document will display first in list.
so how can i sort on that.
can you please help me asap.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/get-term-frequency-just-only-keywords-search-tp4084510p4133260.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Optimal setup for multiple tools

2014-04-26 Thread Jimmy Lin
Hello,

My team has been working with SOLR for the last 2 years.  We have two main
indices:

1. documents
-index and store main text
-one record for each document
2. places (all of the geospatial places found in the documents above)
-index but don't store main text
-one record for each place.  could have thousands in a single
document but the ratio has seemed to come out to 6:1 places to documents

We have several tools that query the above indices.  One is just a standard
search tool that returns documents filtered on keyword, temporal, and
geospatial filters.  Another is a geospatial tool that queries the places
collection.  We now have a requirement to provide document highlighting
when querying in the geospatial tool.

Does anyone have any suggestions/prior experience on how they would set up
two collections that are essentially different "views" of the data?  Also
any tips on how to ensure that these two collections are "in sync" (meaning
any documents indexed into the documents collection are also properly
indexed in places)?

Thanks alot,

Jimmy Lin


Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller
System.currentTimeMillis can jump around due to NTP, etc. If you are trying to 
count elapsed time, you don’t want to use a method that can jump around with 
the results.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) wrote:

Hi Rafał Kuć  
I got it,the point is many operating systems measure time in units of  
tens of milliseconds,and the System.currentTimeMillis() is just base on  
operating system.  
In my case,I just do DIH with a crontable, Is there any possiblity to get  
in that trouble?I am really can not picture what the situation may lead to  
the problem.  


Thanks very much.  


2014-04-26 20:49 GMT+08:00 YouPeng Yang :  

> Hi Mark Miller  
> Sorry to get you in these discussion .  
> I notice that Mark Miller report this issure in  
> https://issues.apache.org/jira/browse/SOLR-5734 according to  
> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
> the zookeeper.  
> If I just do DIH with JDBCDataSource ,I do not think it will get the  
> problem.  
> Please give some hints  
>  
> >> Bonus,just post the last mail I send about the problem:  
>  
> I have just compare the difference between the version 4.6.0 and 4.7.1.  
> Notice that the time in the getConnection function is declared with the  
> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
> Curious about the resson for the change.the benefit of it .Is it  
> neccessory?  
> I have read the SOLR-5734 ,  
> https://issues.apache.org/jira/browse/SOLR-5734  
> Do some google about the difference of currentTimeMillis and nano,but  
> still can not figure out it.  
>  
> Thank you very much.  
>  
>  
> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>  
> Hi  
>> I have just compare the difference between the version 4.6.0 and  
>> 4.7.1. Notice that the time in the getConnection function is declared  
>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>> Curious about the resson for the change.the benefit of it .Is it  
>> neccessory?  
>> I have read the SOLR-5734 ,  
>> https://issues.apache.org/jira/browse/SOLR-5734  
>> Do some google about the difference of currentTimeMillis and nano,but  
>> still can not figure out it.  
>>  
>>  
>>  
>>  
>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
>>  
>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
>>>  
 I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
 process that we are using takes 4x as long to complete. The only odd  
 thing I notice is when I enable debug logging for the dataimporthandler  
 process, it appears that in the new version each sql query is resulting  
 in  
 a new connection opened through jdbcdatasource (log:  
 http://pastebin.com/JKh4gpmu). Were there any changes that would  
 affect  
 the speed of running a full import?  
  
>>>  
>>> This is most likely the problem you are experiencing:  
>>>  
>>> https://issues.apache.org/jira/browse/SOLR-5954  
>>>  
>>> The fix will be in the new 4.8 version. The release process for 4.8 is  
>>> underway right now. A second release candidate was required yesterday. If  
>>> no further problems are encountered, the release should be made around the  
>>> middle of next week. If problems are encountered, the release will be  
>>> delayed.  
>>>  
>>> Here's something very important that has been mentioned before: Solr  
>>> 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the  
>>> current release from Oracle as I write this) is recommended as a minimum.  
>>>  
>>> If a 4.7.3 version is built, this is a fix that we should backport.  
>>>  
>>> Thanks,  
>>> Shawn  
>>>  
>>>  
>>  
>  


Re: DIH issues with 4.7.1

2014-04-26 Thread Walter Underwood
NTP should slew the clock rather than jump it. I haven't checked recently, but 
that is how it worked in the 90's when I was organizing the NTP hierarchy at HP.

It only does step changes if the clocks is really wrong. That is most likely at 
reboot, when other demons aren't running yet.

wunder

On Apr 26, 2014, at 7:30 AM, Mark Miller  wrote:

> System.currentTimeMillis can jump around due to NTP, etc. If you are trying 
> to count elapsed time, you don’t want to use a method that can jump around 
> with the results.
> -- 
> Mark Miller
> about.me/markrmiller
> 
> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) 
> wrote:
> 
> Hi Rafał Kuć  
> I got it,the point is many operating systems measure time in units of  
> tens of milliseconds,and the System.currentTimeMillis() is just base on  
> operating system.  
> In my case,I just do DIH with a crontable, Is there any possiblity to get  
> in that trouble?I am really can not picture what the situation may lead to  
> the problem.  
> 
> 
> Thanks very much.  
> 
> 
> 2014-04-26 20:49 GMT+08:00 YouPeng Yang :  
> 
>> Hi Mark Miller  
>> Sorry to get you in these discussion .  
>> I notice that Mark Miller report this issure in  
>> https://issues.apache.org/jira/browse/SOLR-5734 according to  
>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
>> the zookeeper.  
>> If I just do DIH with JDBCDataSource ,I do not think it will get the  
>> problem.  
>> Please give some hints  
>> 
 Bonus,just post the last mail I send about the problem:  
>> 
>> I have just compare the difference between the version 4.6.0 and 4.7.1.  
>> Notice that the time in the getConnection function is declared with the  
>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>> Curious about the resson for the change.the benefit of it .Is it  
>> neccessory?  
>> I have read the SOLR-5734 ,  
>> https://issues.apache.org/jira/browse/SOLR-5734  
>> Do some google about the difference of currentTimeMillis and nano,but  
>> still can not figure out it.  
>> 
>> Thank you very much.  
>> 
>> 
>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>> 
>> Hi  
>>> I have just compare the difference between the version 4.6.0 and  
>>> 4.7.1. Notice that the time in the getConnection function is declared  
>>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>> Curious about the resson for the change.the benefit of it .Is it  
>>> neccessory?  
>>> I have read the SOLR-5734 ,  
>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>> Do some google about the difference of currentTimeMillis and nano,but  
>>> still can not figure out it.  
>>> 
>>> 
>>> 
>>> 
>>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
>>> 
>>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
 
> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
> process that we are using takes 4x as long to complete. The only odd  
> thing I notice is when I enable debug logging for the dataimporthandler  
> process, it appears that in the new version each sql query is resulting  
> in  
> a new connection opened through jdbcdatasource (log:  
> http://pastebin.com/JKh4gpmu). Were there any changes that would  
> affect  
> the speed of running a full import?  
> 
 
 This is most likely the problem you are experiencing:  
 
 https://issues.apache.org/jira/browse/SOLR-5954  
 
 The fix will be in the new 4.8 version. The release process for 4.8 is  
 underway right now. A second release candidate was required yesterday. If  
 no further problems are encountered, the release should be made around the 
  
 middle of next week. If problems are encountered, the release will be  
 delayed.  
 
 Here's something very important that has been mentioned before: Solr  
 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the  
 current release from Oracle as I write this) is recommended as a minimum.  
 
 If a 4.7.3 version is built, this is a fix that we should backport.  
 
 Thanks,  
 Shawn  
 
 
>>> 
>> 

--
Walter Underwood
wun...@wunderwood.org





RE: TB scale

2014-04-26 Thread Toke Eskildsen
> Anyone with experience, suggestions or lessons learned in the 10 -100 TB 
> scale they'd like to share?
> Researching optimum design for a Solr Cloud with, say, about 20TB index.

We're building a web archive with a projected index size of 20TB (distributed 
in 20 shards). Some test results and a short write-up at 
http://sbdevel.wordpress.com/2013/12/06/danish-webscale/ - feel free to ask for 
more details.

tl;dr: We're saying to hell with RAM for caching and putting it all on SSDs on 
a single big machine. Results so far (some distributed tests with 200GB & 400GB 
indexes, some single tests with a production-index of 1TB) are very promising, 
both for plain keyword-search, grouping and faceting (DocValues rocks).

- Toke Eskildsen


Re: TB scale

2014-04-26 Thread Walter Underwood
I think Hathi Trust has a few terabytes of index. They do full-text search on 
10 million books.

http://www.hathitrust.org/blogs/Large-scale-Search

wunder

On Apr 26, 2014, at 8:36 AM, Toke Eskildsen  wrote:

>> Anyone with experience, suggestions or lessons learned in the 10 -100 TB 
>> scale they'd like to share?
>> Researching optimum design for a Solr Cloud with, say, about 20TB index.
> 
> We're building a web archive with a projected index size of 20TB (distributed 
> in 20 shards). Some test results and a short write-up at 
> http://sbdevel.wordpress.com/2013/12/06/danish-webscale/ - feel free to ask 
> for more details.
> 
> tl;dr: We're saying to hell with RAM for caching and putting it all on SSDs 
> on a single big machine. Results so far (some distributed tests with 200GB & 
> 400GB indexes, some single tests with a production-index of 1TB) are very 
> promising, both for plain keyword-search, grouping and faceting (DocValues 
> rocks).
> 
> - Toke Eskildsen





Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller
My answer remains the same. I guess if you want more precise terminology, 
nanoTime will generally be monotonic and currentTimeMillis will not be, due to 
things like NTP, etc. You want monotonicity for measuring elapsed times.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) 
wrote:

NTP should slew the clock rather than jump it. I haven't checked recently, but 
that is how it worked in the 90's when I was organizing the NTP hierarchy at 
HP.  

It only does step changes if the clocks is really wrong. That is most likely at 
reboot, when other demons aren't running yet.  

wunder  

On Apr 26, 2014, at 7:30 AM, Mark Miller  wrote:  

> System.currentTimeMillis can jump around due to NTP, etc. If you are trying 
> to count elapsed time, you don’t want to use a method that can jump around 
> with the results.  
> --  
> Mark Miller  
> about.me/markrmiller  
>  
> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) 
> wrote:  
>  
> Hi Rafał Kuć  
> I got it,the point is many operating systems measure time in units of  
> tens of milliseconds,and the System.currentTimeMillis() is just base on  
> operating system.  
> In my case,I just do DIH with a crontable, Is there any possiblity to get  
> in that trouble?I am really can not picture what the situation may lead to  
> the problem.  
>  
>  
> Thanks very much.  
>  
>  
> 2014-04-26 20:49 GMT+08:00 YouPeng Yang :  
>  
>> Hi Mark Miller  
>> Sorry to get you in these discussion .  
>> I notice that Mark Miller report this issure in  
>> https://issues.apache.org/jira/browse/SOLR-5734 according to  
>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
>> the zookeeper.  
>> If I just do DIH with JDBCDataSource ,I do not think it will get the  
>> problem.  
>> Please give some hints  
>>  
 Bonus,just post the last mail I send about the problem:  
>>  
>> I have just compare the difference between the version 4.6.0 and 4.7.1.  
>> Notice that the time in the getConnection function is declared with the  
>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>> Curious about the resson for the change.the benefit of it .Is it  
>> neccessory?  
>> I have read the SOLR-5734 ,  
>> https://issues.apache.org/jira/browse/SOLR-5734  
>> Do some google about the difference of currentTimeMillis and nano,but  
>> still can not figure out it.  
>>  
>> Thank you very much.  
>>  
>>  
>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>>  
>> Hi  
>>> I have just compare the difference between the version 4.6.0 and  
>>> 4.7.1. Notice that the time in the getConnection function is declared  
>>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>> Curious about the resson for the change.the benefit of it .Is it  
>>> neccessory?  
>>> I have read the SOLR-5734 ,  
>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>> Do some google about the difference of currentTimeMillis and nano,but  
>>> still can not figure out it.  
>>>  
>>>  
>>>  
>>>  
>>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
>>>  
>>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
  
> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
> process that we are using takes 4x as long to complete. The only odd  
> thing I notice is when I enable debug logging for the dataimporthandler  
> process, it appears that in the new version each sql query is resulting  
> in  
> a new connection opened through jdbcdatasource (log:  
> http://pastebin.com/JKh4gpmu). Were there any changes that would  
> affect  
> the speed of running a full import?  
>  
  
 This is most likely the problem you are experiencing:  
  
 https://issues.apache.org/jira/browse/SOLR-5954  
  
 The fix will be in the new 4.8 version. The release process for 4.8 is  
 underway right now. A second release candidate was required yesterday. If  
 no further problems are encountered, the release should be made around the 
  
 middle of next week. If problems are encountered, the release will be  
 delayed.  
  
 Here's something very important that has been mentioned before: Solr  
 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the  
 current release from Oracle as I write this) is recommended as a minimum.  
  
 If a 4.7.3 version is built, this is a fix that we should backport.  
  
 Thanks,  
 Shawn  
  
  
>>>  
>>  

--  
Walter Underwood  
wun...@wunderwood.org  





Re: SOLR 4 not utilizing multi CPU cores

2014-04-26 Thread Erick Erickson
I suspect your problem is that termfreq is looking at _terms_, not
phrases. It has no sense of position, that's a higher-level construct.
So "Research Development" is searched as a single _term_, and there
are no two-word terms.

What use-case are you trying to solve? This seems like an XY problem perhaps..

Best,
Erick

On Sat, Apr 26, 2014 at 12:49 AM, ksmith  wrote:
> hi Salman,
>
> i getting one problem in solr 4.6
> i have upgrade solr 1.4 to solr 4.6 because of i want to display search term
> count,
> and term count getting by solr term frequency
> but when i search only single word than its work fine i get perfect count
> but when i search multiple word within double quote it returning 0 count
> below is my code:
> termfreq(datafield, "Research")  its working fine
> termfreq(datafield, "Research Development") its return 0 but multiple
> document have the same word.
>
> i have try with different field type : text_gen, text_en_splitting, String
> but i didnt get exact result
> can you please help for this.
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SOLR-4-not-utilizing-multi-CPU-cores-tp4105058p4133256.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Optimal setup for multiple tools

2014-04-26 Thread Erick Erickson
Have you considered putting them in the _same_ index? There's not much
penalty at all with having sparsely populated fields in a document, so
the fact that the two parts of your index had orthogonal fields
wouldn't cost you much and would solve the synchronization problem.

You can include a type field to distinguish between the and just
include a filter query to keep them separate. Since that'll be cached,
your search performance should be fine.

Otherwise you should include the fields you need to sort on in the
index you need to sort. Denormalizes the data, but...

About keeping the two in synch, that's really outside Solr, your
indexing process has to manage that I'd guess.

Best,
Erick

On Sat, Apr 26, 2014 at 7:24 AM, Jimmy Lin  wrote:
> Hello,
>
> My team has been working with SOLR for the last 2 years.  We have two main
> indices:
>
> 1. documents
> -index and store main text
> -one record for each document
> 2. places (all of the geospatial places found in the documents above)
> -index but don't store main text
> -one record for each place.  could have thousands in a single
> document but the ratio has seemed to come out to 6:1 places to documents
>
> We have several tools that query the above indices.  One is just a standard
> search tool that returns documents filtered on keyword, temporal, and
> geospatial filters.  Another is a geospatial tool that queries the places
> collection.  We now have a requirement to provide document highlighting
> when querying in the geospatial tool.
>
> Does anyone have any suggestions/prior experience on how they would set up
> two collections that are essentially different "views" of the data?  Also
> any tips on how to ensure that these two collections are "in sync" (meaning
> any documents indexed into the documents collection are also properly
> indexed in places)?
>
> Thanks alot,
>
> Jimmy Lin


zkCli zkhost parameter

2014-04-26 Thread Scott Stults
It looks like this only takes a single host as its value, whereas the
zkHost environment variable for Solr takes a comma-separated list.
Shouldn't the client also take a comma-separated list?

k/r,
Scott


Re: zkCli zkhost parameter

2014-04-26 Thread Mark Miller
Have you tried a comma-separated list or are you going by documentation? It 
should work. 
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 1:03:25 PM, Scott Stults 
(sstu...@opensourceconnections.com) wrote:

It looks like this only takes a single host as its value, whereas the  
zkHost environment variable for Solr takes a comma-separated list.  
Shouldn't the client also take a comma-separated list?  

k/r,  
Scott  


Re: DIH issues with 4.7.1

2014-04-26 Thread Walter Underwood
NTP works very hard to keep the clock positive monotonic. But nanoTime is 
intended for elapsed time measurement anyway, so it is the right choice.

You can get some pretty fun clock behavior by running on virtual machines, like 
in AWS. And some system real time clocks don't tick during a leap second. And 
Windows system clocks are probably still hopeless.

If you want to run the clock backwards, we don't need NTP, we can set it with 
"date".

wunder

On Apr 26, 2014, at 9:10 AM, Mark Miller  wrote:

> My answer remains the same. I guess if you want more precise terminology, 
> nanoTime will generally be monotonic and currentTimeMillis will not be, due 
> to things like NTP, etc. You want monotonicity for measuring elapsed times.
> -- 
> Mark Miller
> about.me/markrmiller
> 
> On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) 
> wrote:
> 
> NTP should slew the clock rather than jump it. I haven't checked recently, 
> but that is how it worked in the 90's when I was organizing the NTP hierarchy 
> at HP.  
> 
> It only does step changes if the clocks is really wrong. That is most likely 
> at reboot, when other demons aren't running yet.  
> 
> wunder  
> 
> On Apr 26, 2014, at 7:30 AM, Mark Miller  wrote:  
> 
>> System.currentTimeMillis can jump around due to NTP, etc. If you are trying 
>> to count elapsed time, you don’t want to use a method that can jump around 
>> with the results.  
>> --  
>> Mark Miller  
>> about.me/markrmiller  
>> 
>> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) 
>> wrote:  
>> 
>> Hi Rafał Kuć  
>> I got it,the point is many operating systems measure time in units of  
>> tens of milliseconds,and the System.currentTimeMillis() is just base on  
>> operating system.  
>> In my case,I just do DIH with a crontable, Is there any possiblity to get  
>> in that trouble?I am really can not picture what the situation may lead to  
>> the problem.  
>> 
>> 
>> Thanks very much.  
>> 
>> 
>> 2014-04-26 20:49 GMT+08:00 YouPeng Yang :  
>> 
>>> Hi Mark Miller  
>>> Sorry to get you in these discussion .  
>>> I notice that Mark Miller report this issure in  
>>> https://issues.apache.org/jira/browse/SOLR-5734 according to  
>>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
>>> the zookeeper.  
>>> If I just do DIH with JDBCDataSource ,I do not think it will get the  
>>> problem.  
>>> Please give some hints  
>>> 
> Bonus,just post the last mail I send about the problem:  
>>> 
>>> I have just compare the difference between the version 4.6.0 and 4.7.1.  
>>> Notice that the time in the getConnection function is declared with the  
>>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>> Curious about the resson for the change.the benefit of it .Is it  
>>> neccessory?  
>>> I have read the SOLR-5734 ,  
>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>> Do some google about the difference of currentTimeMillis and nano,but  
>>> still can not figure out it.  
>>> 
>>> Thank you very much.  
>>> 
>>> 
>>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>>> 
>>> Hi  
 I have just compare the difference between the version 4.6.0 and  
 4.7.1. Notice that the time in the getConnection function is declared  
 with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
 Curious about the resson for the change.the benefit of it .Is it  
 neccessory?  
 I have read the SOLR-5734 ,  
 https://issues.apache.org/jira/browse/SOLR-5734  
 Do some google about the difference of currentTimeMillis and nano,but  
 still can not figure out it.  
 
 
 
 
 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
 
 On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
> 
>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
>> process that we are using takes 4x as long to complete. The only odd  
>> thing I notice is when I enable debug logging for the dataimporthandler  
>> process, it appears that in the new version each sql query is resulting  
>> in  
>> a new connection opened through jdbcdatasource (log:  
>> http://pastebin.com/JKh4gpmu). Were there any changes that would  
>> affect  
>> the speed of running a full import?  
>> 
> 
> This is most likely the problem you are experiencing:  
> 
> https://issues.apache.org/jira/browse/SOLR-5954  
> 
> The fix will be in the new 4.8 version. The release process for 4.8 is  
> underway right now. A second release candidate was required yesterday. If 
>  
> no further problems are encountered, the release should be made around 
> the  
> middle of next week. If problems are encountered, the release will be  
> delayed.  
> 
> Here's something very important that has been mentioned before: Solr  
> 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the  
> 

Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller
bq. due to things like NTP, etc.

The full sentence is very important. NTP is not the only way for this to happen 
- you also have leap seconds, daylight savings time, internet clock sync, a 
whole host of things that affect currentTimeMillis and not nanoTime. It is 
without question the way to go to even hope for monotonicity.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 1:11:14 PM, Walter Underwood (wun...@wunderwood.org) wrote:

NTP works very hard to keep the clock positive monotonic. But nanoTime is 
intended for elapsed time measurement anyway, so it is the right choice.  

You can get some pretty fun clock behavior by running on virtual machines, like 
in AWS. And some system real time clocks don't tick during a leap second. And 
Windows system clocks are probably still hopeless.  

If you want to run the clock backwards, we don't need NTP, we can set it with 
"date".  

wunder  

On Apr 26, 2014, at 9:10 AM, Mark Miller  wrote:  

> My answer remains the same. I guess if you want more precise terminology, 
> nanoTime will generally be monotonic and currentTimeMillis will not be, due 
> to things like NTP, etc. You want monotonicity for measuring elapsed times.  
> --  
> Mark Miller  
> about.me/markrmiller  
>  
> On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) 
> wrote:  
>  
> NTP should slew the clock rather than jump it. I haven't checked recently, 
> but that is how it worked in the 90's when I was organizing the NTP hierarchy 
> at HP.  
>  
> It only does step changes if the clocks is really wrong. That is most likely 
> at reboot, when other demons aren't running yet.  
>  
> wunder  
>  
> On Apr 26, 2014, at 7:30 AM, Mark Miller  wrote:  
>  
>> System.currentTimeMillis can jump around due to NTP, etc. If you are trying 
>> to count elapsed time, you don’t want to use a method that can jump around 
>> with the results.  
>> --  
>> Mark Miller  
>> about.me/markrmiller  
>>  
>> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) 
>> wrote:  
>>  
>> Hi Rafał Kuć  
>> I got it,the point is many operating systems measure time in units of  
>> tens of milliseconds,and the System.currentTimeMillis() is just base on  
>> operating system.  
>> In my case,I just do DIH with a crontable, Is there any possiblity to get  
>> in that trouble?I am really can not picture what the situation may lead to  
>> the problem.  
>>  
>>  
>> Thanks very much.  
>>  
>>  
>> 2014-04-26 20:49 GMT+08:00 YouPeng Yang :  
>>  
>>> Hi Mark Miller  
>>> Sorry to get you in these discussion .  
>>> I notice that Mark Miller report this issure in  
>>> https://issues.apache.org/jira/browse/SOLR-5734 according to  
>>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
>>> the zookeeper.  
>>> If I just do DIH with JDBCDataSource ,I do not think it will get the  
>>> problem.  
>>> Please give some hints  
>>>  
> Bonus,just post the last mail I send about the problem:  
>>>  
>>> I have just compare the difference between the version 4.6.0 and 4.7.1.  
>>> Notice that the time in the getConnection function is declared with the  
>>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>> Curious about the resson for the change.the benefit of it .Is it  
>>> neccessory?  
>>> I have read the SOLR-5734 ,  
>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>> Do some google about the difference of currentTimeMillis and nano,but  
>>> still can not figure out it.  
>>>  
>>> Thank you very much.  
>>>  
>>>  
>>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>>>  
>>> Hi  
 I have just compare the difference between the version 4.6.0 and  
 4.7.1. Notice that the time in the getConnection function is declared  
 with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
 Curious about the resson for the change.the benefit of it .Is it  
 neccessory?  
 I have read the SOLR-5734 ,  
 https://issues.apache.org/jira/browse/SOLR-5734  
 Do some google about the difference of currentTimeMillis and nano,but  
 still can not figure out it.  
  
  
  
  
 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
  
 On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
>  
>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
>> process that we are using takes 4x as long to complete. The only odd  
>> thing I notice is when I enable debug logging for the dataimporthandler  
>> process, it appears that in the new version each sql query is resulting  
>> in  
>> a new connection opened through jdbcdatasource (log:  
>> http://pastebin.com/JKh4gpmu). Were there any changes that would  
>> affect  
>> the speed of running a full import?  
>>  
>  
> This is most likely the problem you are experiencing:  
>  
> https://issues.apache.org/jira/browse/SOLR-5954  
>  
> The fix will be in t

Re: Search for a mask that matches the requested string

2014-04-26 Thread Alan Woodward
Hi, I'm the author of luwak.  I have a half-finished version sitting in a 
branch somewhere that pulls all the intervals-fork-specific code out of the 
library and would run with 4.6.  It would need to be integrated into Solr as 
well, but I have an upcoming project which may well do just that.  Feel free to 
ping me directly!

Alan Woodward
www.flax.co.uk


On 26 Apr 2014, at 03:29, Otis Gospodnetic wrote:

> Luwak is not based on the fork of Lucene or rather, the fork you are seeing
> is there only because the Luwak authors needed highlighting.  If you don't
> need highlighting you can probably modify Luwak a bit to use regular
> Lucene.  The Lucene fork you are seeing there will also, eventually, be
> committed to Lucene trunk and then hopefully backported to 4.x.
> 
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> On Fri, Apr 25, 2014 at 6:46 PM, Muhammad Gelbana wrote:
> 
>> Luwak is based on a fork of solr\lucene which I cannot use. I have to do
>> this using solr 4.6, whether by writing extra code or not. Thanks.
>> 
>> *-*
>> *Muhammad Gelbana*
>> http://www.linkedin.com/in/mgelbana
>> 
>> 
>> On Sat, Apr 26, 2014 at 12:13 AM, Ahmet Arslan  wrote:
>> 
>>> Hi,
>>> 
>>> You don't need to write code for this. Use luwak (I gave the link in my
>>> first e-mail) instead.
>>> 
>>> If your can't get luwak running because its too complicated etc, see a
>>> similar discussion
>>> 
>>> http://find.searchhub.org/document/9411388c7d2de701#36e50082e918b10c
>>> 
>>> where diy-percolator example pointer is given. It is an example to use
>>> memory index.
>>> 
>>> Ahmet
>>> 
>>> 
>>> 
>>> On Saturday, April 26, 2014 1:05 AM, Muhammad Gelbana <
>> m.gelb...@gmail.com>
>>> wrote:
>>> @Jack, I am ready to write custom code to implement such feature but I
>>> don't know what feature in solr should I extend ? Where should I start ?
>> I
>>> believe it should be a very simple task.
>>> 
>>> @Ahmet, how can I use the class you mentioned ? Is there a tutorial for
>> it
>>> ? I'm not sure how the code in the class's description should work, I've
>>> never extended solr before.
>>> 
>>> Thank you all.
>>> 
>>> *-*
>>> *Muhammad Gelbana*
>>> http://www.linkedin.com/in/mgelbana
>>> 
>>> 
>>> 
>>> On Fri, Apr 25, 2014 at 10:38 PM, Ahmet Arslan 
>> wrote:
>>> 
 
 
 Hi,
 
 Your use case is different than ad hoc retrieval. Where you have set of
 documents and varying queries.
 
 In your case it is the reverse, you have a query (string masks) stored
 A?, and incoming documents are percolated against it.
 
 out of the box Solr does not have support for this today.
 
 Please see :
 
 
 
>>> 
>> http://lucene.apache.org/core/4_7_2/memory/org/apache/lucene/index/memory/MemoryIndex.html
 
 By the way wildcard ? matches a single character.
 
 Ahmet
 
 
 On Friday, April 25, 2014 11:02 PM, Muhammad Gelbana <
>>> m.gelb...@gmail.com>
 wrote:
 I have no idea how can this help me. I have been using solr for a few
>>> weeks
 and I'm not familiar with it yet. I'm asking for a very simple task, a
>>> way
 to customize how solr matches a string, does this exist in solr ?
 
 *-*
 *Muhammad Gelbana*
 http://www.linkedin.com/in/mgelbana
 
 
 
 On Thu, Apr 24, 2014 at 10:09 PM, Ahmet Arslan 
>>> wrote:
 
> Hi,
> 
> Please see : https://github.com/flaxsearch/luwak
> 
> Ahmet
> 
> 
> On Thursday, April 24, 2014 8:40 PM, Muhammad Gelbana <
 m.gelb...@gmail.com>
> wrote:
> (Please make sure you reply to my address because I didn't subscribe
>> to
> this mailing list)
> 
> I'm using Solr 4.6
> 
> I need to store string masks in Solr. By masks, I mean strings that
>> can
> match other strings.
> 
> Then I need to search for masks that match the string I'm providing
>> in
>>> my
> query. For example, assume the following single-field document stored
>>> in
> Solr:
> 
> {
>"fieldA": "__A__"
> }
> 
> I need to be able to find this document if I query the fieldA field
>>> with
 a
> string like *12A34*, as the underscore "*_*" matches a single string.
>>> The
> single string matching mechanism is my strict goal here, multiple
>>> string
> matching won't be helpful.
> 
> I hope I was clear enough. Please elaborate because I'm not versatile
 with
> solr and I haven't been using it for too long.
> Thank you.
> 
> *-*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
> 
> 
 
>>> 
>>> 
>> 



Re: Indexing Big Data With or Without Solr

2014-04-26 Thread Aman Tandon
Thanks vineet

With Regards
Aman Tandon


On Wed, Apr 23, 2014 at 7:21 PM, Vineet Mishra wrote:

> I did it with Tomcat and Zookeeper Ensemble, will mail you the steps
> shortly.
>
> Cheers
>
>
> On Sat, Apr 19, 2014 at 9:09 AM, Aman Tandon  >wrote:
>
> > Vineet please share after you setup for solr cloud
> > Are you using jetty or tomcat.?
> >
> > On Saturday, April 19, 2014, Vineet Mishra 
> wrote:
> > > Thanks Furkan, I will definitely give it a try then.
> > >
> > > Thanks again!
> > >
> > >
> > >
> > >
> > > On Tue, Apr 15, 2014 at 7:53 PM, Furkan KAMACI  > >wrote:
> > >
> > >> Hi Vineet;
> > >>
> > >> I've been using SolrCloud for such kind of Big Data and I think that
> you
> > >> should consider to use it. If you have any problems you can ask it
> here.
> > >>
> > >> Thanks;
> > >> Furkan KAMACI
> > >>
> > >>
> > >> 2014-04-15 13:20 GMT+03:00 Vineet Mishra :
> > >>
> > >> > Hi All,
> > >> >
> > >> > I have worked with Solr 3.5 to implement real time search on some
> > 100GB
> > >> > data, that worked fine but was little slow on complex
> queries(Multiple
> > >> > group/joined queries).
> > >> > But now I want to index some real Big Data(around 4 TB or even
> more),
> > can
> > >> > SolrCloud be solution for it if not what could be the best possible
> > >> > solution in this case.
> > >> >
> > >> > *Stats for the previous Implementation:*
> > >> > It was Master Slave Architecture with normal Standalone multiple
> > instance
> > >> > of Solr 3.5. There were around 12 Solr instance running on different
> > >> > machines.
> > >> >
> > >> > *Things to consider for the next implementation:*
> > >> > Since all the data is sensor data hence it is the factor of
> duplicity
> > and
> > >> > uniqueness.
> > >> >
> > >> > *Really urgent, please take the call on priority with set of
> feasible
> > >> > solution.*
> > >> >
> > >> > Regards
> > >> >
> > >>
> > >
> >
> > --
> > Sent from Gmail Mobile
> >
>


[ANN] Apache Gora 0.4 Released

2014-04-26 Thread Lewis John Mcgibbney
Good Afternoon Everyone,
>
> The Apache Gora team are very proud to announce the immediate release of
> Gora 0.4 which is a major release for the project.
>
> The Apache Gora open source framework provides an in-memory data model and
> persistence for big data. Gora supports persisting to column stores, key
> value stores, document stores and RDBMSs, and analyzing the data with
> extensive Apache Hadoop™ MapReduce support. Gora uses the Apache Software
> License v2.0.
>
> This release addresses no fewer than 60 issues. Major improvements within
> the release scope comprise a complete upgrade to Apache Avro 1.7.X and
> overhaul of the Gora persistency API (such improvements enable Gora to be
> used to map much more expressive and complicated data structures than
> previously available), upgrades to Apache HBase 0.94.13, Apache Cassandra
> 2.0.X and Apache Accumulo 1.5.X.
> Users can also benefit from using Gora + Solr for object-to-datastore
> mapping with the addition of the new Solr module which uses Solr 4.X.
>
> Gora sources tar.gz and .zip release artifacts (along with signatures) can
> be obtained from visiting our DOWNLOADS [0] page. We also encourage users
> to upgrade their build dependencies using our published Maven Artifacts
[1].
>
> Please redistribute this email to your project mailing list.
>
> Thanks
> Lewis
> (on behalf of the Apache Gora team)
>
> [0] http://gora.apache.org/downloads.html
> [1] http://search.maven.org/#search|ga|1|gora
>
>


RE: How can I convert xml message for updating a Solr index to a javabin file

2014-04-26 Thread Elran Dvir
Does anyone know a way to do this?

Thanks.

-Original Message-
From: Elran Dvir 
Sent: Thursday, April 24, 2014 4:11 PM
To: solr-user@lucene.apache.org
Subject: RE: How can I convert xml message for updating a Solr index to a 
javabin file

I want to measure xml vs javabin update message indexing performance.

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk]
Sent: Thursday, April 24, 2014 2:04 PM
To: solr-user@lucene.apache.org
Subject: Re: How can I convert xml message for updating a Solr index to a 
javabin file

Why would you want to do this? Javabin is used by SolrJ to communicate with 
Solr. XML is good enough for communicating from the command line/curl, as is 
JSON. Attempting to use javabin just seems to add an unnecessary complication.

Upayavifra

On Thu, Apr 24, 2014, at 10:20 AM, Elran Dvir wrote:
> Hi all,
> Is there a way I can covert a xml Solr update message file to javabin 
> file? If so, How?
> How can I use curl to update Solr by javabin message file?
> 
> Thank you very much.

Email secured by Check Point