How to configure solr to not bind at 8983

2015-08-19 Thread Samy Ateia
I changed the solr listen port in the solr.in.sh file in my solr home directory by setting the variable: SOLR_PORT=. But Solr is still trying to also listen on 8983 because it gets started with the -DSTOP.PORT=8983 variable. What is this -DSTOP.PORT variable for and where should I configure

Re: How to find the ordinal for a numeric doc value

2015-08-19 Thread Mikhail Khludnev
Hello, Giving the code https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/schema/TrieField.java#L727 it creates NumericDocValuesField only. try to define field as multivalued, giving the code it creates SortedSetDocValuesField. On Wed, Aug 19, 2015 at 11:13 PM, ted

Re: How to find the ordinal for a numeric doc value

2015-08-19 Thread Toke Eskildsen
tedsolr wrote: > I'm sure there is a good reason why SortedDocValues exposes > the backing dictionary and [Sorted]NumericDocValues does not. There is: Numerics does not have a backing dictionary. Instead of storing the values via the intermediate ordinals-map (aka by reference), they are stored

Re: plagiarism Checker with solr

2015-08-19 Thread Roshan Agarwal
Dear Jack, Thank you very much, Roshan Agarwal On Mon, Aug 10, 2015 at 8:38 PM, Jack Krupansky wrote: > The simplest and maybe best approach is to use the edismax query parser and > query all terms using the OR operator and use the PF1, PF2, and PF3 > parameters to boost phrases so that the c

Solr having problems with highlighting when using Jieba anaylzer

2015-08-19 Thread Zheng Lin Edwin Yeo
Hi, I'm using Jieba analyser to index Chinese characters in the Solr. It works fine with the segmentation when using the Anaylsis on the Solr Admin UI. However, when I tried to do highlighting in Solr, it is not highlighting in the correct place. For example, when I search for 自然环境与企业本身, it highl

How to Delta-Import to solr by Id(key word)

2015-08-19 Thread fent
I have a table with Id , this is a increase attribute, So I want to Delta add new category to solr may like "select * from my_table where Id > '${latest_id}'" the latest_id is the max Id that last time add , how to config the data-config.xml. or how to get the max Id from the solr? ths! -

SolrCloud: /live_nodes in ZK shows the server is there, but all cores are down in /clusterstate.json.

2015-08-19 Thread forest_soup
Opened a JIRA - https://issues.apache.org/jira/browse/SOLR-7947 A SolrCloud with 2 solr node in Tomcat server on 2 VM servers. After restart one solr node, the cores on it turns to "down" state and logs showing below errors. Logs are in attachmenent. solr.zip

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Troy Edwards
Thank you for taking the time to do the test. I have been doing similar tests using the post Tool (SimplePostTool) with the real data and was able to get to about 10K documents/second. I am considering using multiple files (one per client) ftp'd into a solr node and then use a scheduled job to us

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Troy Edwards
Are you suggesting that requests come into a service layer that identifies which client is on which solrcloud and passes the request to that cloud? Thank you On Wed, Aug 19, 2015 at 1:13 PM, Toke Eskildsen wrote: > Troy Edwards wrote: > > My average document size is 400 bytes > > Number of doc

Re: Solrcloud node is not comming up

2015-08-19 Thread Erick Erickson
Well, you can use curl instead ;). But at present there's no real collections admin UI akin to the core admin UI, although that's in the works with the new Angular JS based admin UI, but the ETA is not defined quite yet although it shouldn't be all that far away. On Wed, Aug 19, 2015 at 2:48 PM

Re: Cache

2015-08-19 Thread Nagasharath
I will go with {!cache=false}. Can we specify facet method in json nested faceting query? > On 19-Aug-2015, at 7:07 pm, Yonik Seeley wrote: > >> On Wed, Aug 19, 2015 at 8:00 PM, Nagasharath >> wrote: >> Trying to evaluate the performance of queries with and without cache > > Yeah, so to

Re: Cache

2015-08-19 Thread Yonik Seeley
On Wed, Aug 19, 2015 at 8:00 PM, Nagasharath wrote: > Trying to evaluate the performance of queries with and without cache Yeah, so to try and see how much a specific type of query costs, you can use {!cache=false} But I've seen some people trying to benchmark the performance of the *system* wit

Re: Cache

2015-08-19 Thread Walter Underwood
Why? Do you evaluate Unix performance with and without file buffers? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Aug 19, 2015, at 5:00 PM, Nagasharath wrote: > Trying to evaluate the performance of queries with and without cache > > > >> On 18

Re: Cache

2015-08-19 Thread Nagasharath
Trying to evaluate the performance of queries with and without cache > On 18-Aug-2015, at 11:30 am, Yonik Seeley wrote: > > On Tue, Aug 18, 2015 at 12:23 PM, naga sharathrayapati > wrote: >> Is it possible to clear the cache through query? >> >> I need this for performance valuation. > > No

Re: Changing Similarity without re-indexing (for example from default to BM25)

2015-08-19 Thread Ahmet Arslan
Hi again, Here is a relevant/past discussion : http://search-lucene.com/m/eHNlTDHKb17MW532 Ahmet On Thursday, August 20, 2015 2:28 AM, Ahmet Arslan wrote: Hi Tom, computeNorm(FieldInvertState) method is the only place where similarity is tied to indexing process. If you want to switch bet

Re: Changing Similarity without re-indexing (for example from default to BM25)

2015-08-19 Thread Ahmet Arslan
Hi Tom, computeNorm(FieldInvertState) method is the only place where similarity is tied to indexing process. If you want to switch between different similarities, they should share the same implementation for the method. For example, subclasses of SimilarityBase can be used without re-indexing.

Re: Reindexing

2015-08-19 Thread Alexandre Rafalovitch
Reload will get the new schema definitions. But all the indexed content will stay as is and will probably start causing problems if you changed analyzer definitions seriously. You probably will have to reindex from scratch/external source. Sorry. Solr Analyzers, Tokenizers, Filters, URPs and

Reindexing

2015-08-19 Thread Azazel K
Hi, We have an over engineered index that we would be to rework. It's already holding 150M documents with 94GB of index size. We have High index/high query system running Solr 4.5. My question - If we update the schema, can we run reindex by using "Reload" action in CoreAdmin UI? Will that r

Re: Solrcloud node is not comming up

2015-08-19 Thread Merlin Morgenstern
Thank you for the quick answer. I learned now how to use the Collections API. Is there a "better" way to issue the commands then to enter them into the Browser as URL and getting back JSON? 2015-08-19 22:23 GMT+02:00 Erick Erickson : > No, nothing. The graphical view shows collections and the

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Toke Eskildsen
Toke Eskildsen wrote > Use more than one cloud. Make them fully independent. > As I suggested when you asked 4 days ago. That would > also make it easy to scale: Just measure how much a > single setup can take and do the math. The goal is 250K documents/second. I tried modifying the books.csv-ex

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Susheel Kumar
For Indexing 3.5 billion documents, you will not only run into bottleneck with Solr but also at different places (data acquisition, solr document object creation, submitting in bulk/batches to Solr). This will require parallelizing the above operations at each of the above steps which can get you

Re: Solrcloud node is not comming up

2015-08-19 Thread Susheel Kumar
When you are adding a node,what exactly you are looking for that node to do. Are you adding node to create a new Replica in which case you will call ADDREPLICA collections API. Thanks, Susheel On Wed, Aug 19, 2015 at 3:42 PM, Merlin Morgenstern < merlin.morgenst...@gmail.com> wrote: > I have a

Re: Solrcloud node is not comming up

2015-08-19 Thread Erick Erickson
No, nothing. The graphical view shows collections and the associated replicas. This new node has no replicas that are part of any collection, so it won't show in the graphical view. If you create a new collection that happens to put a replica on the new node, it'll then show up as part of that col

Re: How to find the ordinal for a numeric doc value

2015-08-19 Thread tedsolr
One error (others perhaps?) in my statement ... the code searcher.getLeafReader().getSortedDocValues(field) just returns null for numeric and date fields. That is why they appear to be ignored, not that the ordinals are all absent or equivalent. But my question is still valid I think! -- View

How to find the ordinal for a numeric doc value

2015-08-19 Thread tedsolr
I'm trying to upgrade my custom post filter from Solr 4.9 to 5.2. This filter collapses documents based on a user chosen field set. The key to the whole thing is determining document uniqueness based on a fixed int array of field value ordinals. In 4.9 this worked regardless of the field type. In t

Solrcloud node is not comming up

2015-08-19 Thread Merlin Morgenstern
I have a Solrcloud cluster running with 2 nodes, configured with 1 shard and 2 replica. Now I have added a node on a new server, registered with the same three zookeepers. The node shows up inside the tree of the Solrcloud admin GUI under "live nodes". Unfortunatelly the new node is not inside the

Re: Performance issue with FILTER QUERY

2015-08-19 Thread Erick Erickson
If you're committing that rapidly then you're correct, filter caching may not be a good fit. The entire _point_ of filter caching is to increase performance of subsequent executions of the exact same fq clause. But if you're throwing them away every second there's little/no benefit. You really hav

Re: jetty.xml

2015-08-19 Thread Erick Erickson
what's happening on the system when you see this? If you're heavily indexing and NOT using SolrJ.cloudSolrSever/Client, then a lot of threads can be occupied forwarding documents to the other shards. Best, Erick On Wed, Aug 19, 2015 at 6:55 AM, Davis, Daniel (NIH/NLM) [C] wrote: > Jetty includes

Re: Geospatial Predicate Question

2015-08-19 Thread david.w.smi...@gmail.com
Hi Jamie, Your understanding is inverted. The predicates can be read as: . For indexed point data, there is almost no semantic different between the Within and Intersects predicates. There is if the field is multi-valued and you want to ensure that all of the points for a document are with

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Erick Erickson
Ir you're sitting on HDFS anyway, you could use MapReduceIndexerTool. I'm not sure that'll hit your rate, it spends some time copying things around. If you're not on HDFS, though, it's not an option. Best, Erick On Wed, Aug 19, 2015 at 11:36 AM, Upayavira wrote: > > > On Wed, Aug 19, 2015, at 07

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Upayavira
On Wed, Aug 19, 2015, at 07:13 PM, Toke Eskildsen wrote: > Troy Edwards wrote: > > My average document size is 400 bytes > > Number of documents that need to be inserted 25/second > > (for a total of about 3.6 Billion documents) > > > Any ideas/suggestions on how that can be done? (use a cl

Re: Changing Similarity without re-indexing (for example from default to BM25)

2015-08-19 Thread Upayavira
warning: I'm no expert on other similarities. Having said that, I'm not aware of similarities being used in the indexing process - during indexing term frequency, document frequency, field norms, and so on are all recorded. These are things that the default similarity (TF/IDF) uses to calculate it

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Toke Eskildsen
Troy Edwards wrote: > My average document size is 400 bytes > Number of documents that need to be inserted 25/second > (for a total of about 3.6 Billion documents) > Any ideas/suggestions on how that can be done? (use a client > or uploadcsv or stream or data import handler) Use more than on

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Shawn Heisey
On 8/19/2015 11:09 AM, Troy Edwards wrote: > I have a requirement where I have to bulk insert a lot of documents in > SolrCloud. > > My average document size is 400 bytes > Number of documents that need to be inserted 25/second (for a total of > about 3.6 Billion documents) > > Any ideas/sugges

Re: Is it a good query performance with this data size ?

2015-08-19 Thread wwang525
Hi Upayavira, I happened to compose individual fq for each field, such as: fq=Gatewaycode:(...)&fq=DestCode:(...)&fq=DateDep:(...)&fq=Duration:(...) It is nice to know that I am not creating unnecessary cache entries since the above method results in minimal carnality as you pointed out. Thank

Re: How to Fast Bulk Inserting documents

2015-08-19 Thread Vineeth Dasaraju
I have been using the solrj client and get speeds of 1000 objects per second. The size of my object is around 4 kb. On Aug 19, 2015 12:09 PM, "Troy Edwards" wrote: > I have a requirement where I have to bulk insert a lot of documents in > SolrCloud. > > My average document size is 400 bytes > Num

How to Fast Bulk Inserting documents

2015-08-19 Thread Troy Edwards
I have a requirement where I have to bulk insert a lot of documents in SolrCloud. My average document size is 400 bytes Number of documents that need to be inserted 25/second (for a total of about 3.6 Billion documents) Any ideas/suggestions on how that can be done? (use a client or uploadcsv

Re: Is it a good query performance with this data size ?

2015-08-19 Thread Upayavira
Yes, you can limit the size of the filter cache, as Erick says, but then, you could just end up with cache churn, where you are constantly re-populating your cache as stuff gets pushed out, only to have to regenerate it again for the next query. Is it possible to decompose these queries into parts

Changing Similarity without re-indexing (for example from default to BM25)

2015-08-19 Thread Tom Burton-West
Hello all, The last time I worked with changing Simlarities was with Solr 4.1 and at that time, it was possible to simply change the schema to specify the use of a different Similarity without re-indexing. This allowed me to experiment with several different ranking algorithms without having to

Re: Is it a good query performance with this data size ?

2015-08-19 Thread Erick Erickson
bq: can I limit the size of the three caches so that the RAM usage will be under control That's exactly what the "size" parameter is for. As Upayavira says, the rough size of each entry in the filterCache is maxDocs/8 + (sizeof query string). queryResultCache is much smaller per entry, it's rou

Re: Is it a good query performance with this data size ?

2015-08-19 Thread wwang525
Hi Upayavira, Thank you very much for pointing out the potential design issue The queries will be determined through a configuration by business users. There will be limited number of queries every day, and will get executed by customers repeatedly. However, business users will change the configu

Re: Difficulties in getting Solrcloud running

2015-08-19 Thread Susheel Kumar
Use command like below to create collection http:// :/solr/admin/collections?action=CREATE&name=&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName= Susheel On Wed, Aug 19, 2015 at 11:03 AM, Kevin Lee wrote: > Hi, > > Have you created a collection yet? If not, then the

Re: Difficulties in getting Solrcloud running

2015-08-19 Thread Kevin Lee
Hi, Have you created a collection yet? If not, then there won’t be a graph yet. It doesn’t show up until there is at least one collection. - Kevin > On Aug 19, 2015, at 5:48 AM, Merlin Morgenstern > wrote: > > HI everybody, > > I am trying to setup solrcloud on ubuntu and somehow the grap

Re: Is it a good query performance with this data size ?

2015-08-19 Thread Upayavira
You say "all of my queries are based upon fq"? Why? How unique are they? Remember, for each fq value, it could end up storing one bit per document in your index. If you have 8m documents, you could end up with a cache usage of 1Mb, for that query alone! Filter queries are primarily designed for qu

Lucene 5.2.1 Spatial Strategy PointVectorStrategy

2015-08-19 Thread Pablo Mincz
Hi, I'm implementing a sort search by distance with a PointVectorStrategy. In the index process I used createIndexableFields from the strategy and makePoint from the context GEO. But when I'm sorting the search I get the error: Java::JavaLang::IllegalStateException: unexpected docvalues type NONE

json facet

2015-08-19 Thread naga sharathrayapati
is it possible to specify facet.method with json nested faceting query? would like to see if there would be a performance improvement using methods

Re: Solr leader and replica version mismatch 4.7.2

2015-08-19 Thread Shawn Heisey
On 8/19/2015 7:52 AM, Jeff Courtade wrote: > We are running SOLR 4.7.2 > SolrCloud with 2 shards > one Leader and one replica per shard. > > the "Version" of the replica and leader differ displayed here as... > > curl http://ps01:8983/solr/admin/cores?action=STATUS |sed 's/>\n > 7753045 > > >

Re: Solr leader and replica version mismatch 4.7.2

2015-08-19 Thread Jeff Courtade
What I am trying to determine is a way to validate for instance if a leader dies. As in completely unrecoverable that the data on the replica is an exact match to what the leader had. I need to be able to monitor it and have confidence that it is working as expected. i had assumed the version num

RE: jetty.xml

2015-08-19 Thread Davis, Daniel (NIH/NLM) [C]
Jetty includes a QoSFilter, https://wiki.eclipse.org/Jetty/Reference/QoSFilter, with some changes I think it might be able to throttle the requests coming into Solr from truly outside, e.g. not SolrCloud replication, ZooKeeper etc., so as to make sure that Solr's own work could still get done.

Solr leader and replica version mismatch 4.7.2

2015-08-19 Thread Jeff Courtade
We are running SOLR 4.7.2 SolrCloud with 2 shards one Leader and one replica per shard. the "Version" of the replica and leader differ displayed here as... curl http://ps01:8983/solr/admin/cores?action=STATUS |sed 's/>\n7753045 However the commitTimeMSec lastModified and sizeInBytes matches on

Re: Is it a good query performance with this data size ?

2015-08-19 Thread wwang525
Hi Erick, All my queries are based on fq (filter query). I have to send the randomly generated queries to warm up low level lucene cache. I went to the more tedious way to warm up low level cache without utilizing the three caches by turning off the three caches (set values to zero). Then, I send

Re: jetty.xml

2015-08-19 Thread Shawn Heisey
On 8/18/2015 11:50 PM, William Bell wrote: > We sometimes get a spike in Solr, and we get like 3K of threads and then > timeouts... > > In Solr 5.2.1 the defult jetty settings is kinda crazy for threads - since > the value is HIGH! > > What do others recommend? The setting of 1 is so that th

Re: Disable caching

2015-08-19 Thread Jamie Johnson
This was my original thought. We already have the thread local so should be straight fwd to just wrap the Field name and use that as the key. Again thanks, I really appreciate the feedback On Aug 19, 2015 8:12 AM, "Yonik Seeley" wrote: > On Tue, Aug 18, 2015 at 10:58 PM, Jamie Johnson wrote: >

Difficulties in getting Solrcloud running

2015-08-19 Thread Merlin Morgenstern
HI everybody, I am trying to setup solrcloud on ubuntu and somehow the graph on the admin interface does not show up. It is simply blanck. The tree is available. This is a test installation on one machine. There are 3 zookeepers running. I start two solr nodes like this: solr-5.2.1$ bin/solr s

Re: Disable caching

2015-08-19 Thread Yonik Seeley
On Tue, Aug 18, 2015 at 10:58 PM, Jamie Johnson wrote: > Hmm...so I think I have things setup correctly, I have a custom > QParserPlugin building a custom query that wraps the query built from the > base parser and stores the user who is executing the query. I've added the > username to the hashC

Re: Performance issue with FILTER QUERY

2015-08-19 Thread Mikhail Khludnev
Maulin, Did you check performance with segmented filters which I advised recently? On Wed, Aug 19, 2015 at 10:24 AM, Maulin Rathod wrote: > As per my understanding caches are flushed every time when add new > document to collection (we do soft commit at every 1 sec to make newly > added document

Re: Performance issue with FILTER QUERY

2015-08-19 Thread Mikhail Khludnev
Hello, try to experiment with fq={!cache=false}... or fq={!cache=false cost=100}... see https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters On Wed, Aug 19, 2015 at 8:55 AM, Maulin Rathod wrote: > > Hi, > > http://stackoverflow.com/questions/11627427/solr-query-q-or-filter-q

RE: Performance issue with FILTER QUERY

2015-08-19 Thread Maulin Rathod
As per my understanding caches are flushed every time when add new document to collection (we do soft commit at every 1 sec to make newly added document available for search). Due to which it is not effectively uses cache and hence it slow every time in our case. -Original Message- Fro