Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Let me try to help you, first of all I would like to encourage people to post more information about their scenario than "This is my log, index deleted, help me" :) This kind of Info can be really useful : 1) Solr version 2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual Shard

Re: Date Format Conversion Function Query

2015-06-10 Thread Alessandro Benedetti
Erick will correct me if I am wrong but this function query I don't think it exists. But maybe can be a nice contribution. It should take in input a date format and a field and give in response the new formatted Date. The would be simple to use it : fl=id,persian_date:dateFormat("/mm/dd",greg

Re: Assign rich-text document's title name from clustering results

2015-06-10 Thread Alessandro Benedetti
Hi Edwin, let's do this step by step. Clustering is problem solved by unsupervised machine learning algorithms. The scope of clustering is to group per similarity a corpus of documents, trying to have meaningful groups for a human being. Solr currently provides different approaches for *Query Time

Re: Indexing documents in Chinese

2015-06-10 Thread Zheng Lin Edwin Yeo
I've tried to use solr.HMMChineseTokenizerFactory with the following configurations: It is able to be indexed, but when I tried to search for the words, it matches many more other words and not just the words that I search. Why is this so? For example, the query ht

Re: AngularJS

2015-06-10 Thread Upayavira
On Wed, Jun 10, 2015, at 05:52 AM, William Bell wrote: > Finding DIH issue with the new AngularJS DIH section, while indexing... > > 1,22613/s ? > > Last Update: 22:50:50 > *Indexing since 0:1:38.204* > Requests: 1, Fetched: 1,22613/s, Skipped: 0, Processed: 1,22613/s > Started: 3 minutes ago

Re: Assign rich-text document's title name from clustering results

2015-06-10 Thread Zheng Lin Edwin Yeo
The main objective here is actually to assign a title to the documents as they are being indexed. We actually found that the cluster labels provides a good information on the key points of the documents, but I'm not sure if we can get a good cluster labels with a single documents. Besides getting

Re: Indexing issue - index get deleted

2015-06-10 Thread Midas A
Hi Alessandro, Please find the answers inline and help me out to figure out this problem. 1) Solr version : *4.2.1* 2) Solr architecture :* Master -slave/ Replication with requestHandler* 3) Kind of data source indexed : *Mysql * 4) What happened to the datasource ? any change in there ? : *No c

Re: Indexing issue - index get deleted

2015-06-10 Thread Upayavira
Note the clean= parameter to the DIH. It defaults to true. It will wipe your index before it runs. Perhaps it succeeded at wiping, but failed to connect to your database. Hence an empty DB? clean=true is, IMO, a very dangerous default option. Upayavira On Wed, Jun 10, 2015, at 10:59 AM, Midas A

Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Let me answer in line, to get more info : 2015-06-10 10:59 GMT+01:00 Midas A : > Hi Alessandro, > > Please find the answers inline and help me out to figure out this problem. > > 1) Solr version : *4.2.1* > 2) Solr architecture :* Master -slave/ Replication with requestHandler* > > Where happene

Re: Assign rich-text document's title name from clustering results

2015-06-10 Thread Upayavira
It depends a lot on what the documents are. Some document formats have metadata that stores a title. Perhaps you can just extract that. If not, once you've extracted the content, perhaps you could just have a special field that is the first n words (followed by an ellipsis). If you use a clusteri

Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Wow, Upaya, I didn't know that clean was default=true in the delta import as well! I did know it was default in the full import, but I agree with you that having a default to true for delta import is very dangerous ! But assuming the user was using the delta import so far, if cleaning every time,

Re: Indexing issue - index get deleted

2015-06-10 Thread Upayavira
I was only speaking about full import regarding the default of clean=true. However, looking at the source code, it doesn't seem to differentiate especially between a full and a delta in relation to the default of clean=true, which would be pretty crappy. However, I'd need to try it. Upayavira On

Re: Date Format Conversion Function Query

2015-06-10 Thread Upayavira
Another technology that might make more sense is a Doc Transformer. You also specify them in the fl parameter. I would imagine you could specify fl=id,[persian f=gregorian_Date] See here for more cases: https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents This does no

Solr date variable resolver is not working with MySql

2015-06-10 Thread abhijit bashetti
I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle.Its working fine for me.  Now I am trying the same with Mysql.With the change in database, I have changed the query used in data-config.xml for MySql. The query has variables which are passed url in http.The same thing works fi

Re: Solr date variable resolver is not working with MySql

2015-06-10 Thread Alexandre Rafalovitch
Some reason, you email is complete unreadable with a lot of nbsp instead of spaces. Maybe it is trying to send as broken HTML? You may want to try to reformat the message and resend. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com

Re: Velocity UI and hyperlink

2015-06-10 Thread Sznajder ForMailingList
Hi Erik When running solr in simple mode on my laptop, I found the *vm files under under server/solr/COLLECTION_NAME/conf however, when running on my server in cloud mode (with only one node), I do not find these conf/ directory under server. Does it sit on another place? thanks! On Tue, Jun 9

Re: Velocity UI and hyperlink

2015-06-10 Thread Erik Hatcher
In cloud mode, configurations live in ZooKeeper. By doing the -Dvelocity.template.base.dir=/example/files/conf/velocity/ trick (or baking that into your solrconfig setup for the VelocityResponseWriter) you can have the templates on the file system instead though. — Erik Hatcher, Senior Solutio

Re: TZ & rounding

2015-06-10 Thread jon kerling
Thank you for your reply. So my question is: can I get offset of time if I use NOW/MINUTE and not NOW/DAY rounding? You said  " TZ affects what timezone is used when defining the concept of a "day" for the purposes of rounding by day. " I understand from this answer that query like I mentione

Solr date variable resolver is not working with MySql

2015-06-10 Thread abhijit bashetti
I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle.Its working fine for me.  Now I am trying the same with Mysql.With the change in database, I have changed the query used in data-config.xml for MySql. The query has variables which are passed url in http.The same thing works fi

Re: Solr date variable resolver is not working with MySql

2015-06-10 Thread Shawn Heisey
On 6/10/2015 6:43 AM, abhijit bashetti wrote: > >= to_date('[?, '28/05/2015 11:13:50']', 'DD/MM/ HH24:MI:SS') > Anyone knows where is the problem? Why is the variable resolver not working > as expected? > Note : to_date is function written by us in MySql. > I have checked out the solr c

Re: Assign rich-text document's title name from clustering results

2015-06-10 Thread Alessandro Benedetti
I agree with Upayavira, Title extraction is an activity independent from Solr. Furthermore I would say it's easy to extract the title before the Solr Indexng stage. When we send the content arrives to Solr Update processors it is already a String. If you want to do some clever title extraction, fo

How to assign shard to specifc node?

2015-06-10 Thread MOIS Martin (MORPHO)
Hello, I have a cluster with 3 nodes (node1, node2 and node3). Now I want to create a new collection with 3 shards using `implicit` routing: http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&router.name=implicit&shards=shard1,shard2,shard3&router.field

Re: How to assign shard to specifc node?

2015-06-10 Thread Erick Erickson
Take a look at the collections API CREATE command in more detail here: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1 Admittedly this is 5.2 but you didn't mention what version of Solr you're using. In particular the createNodeSet and createNodeSet.shuffle par

Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Just taking a look to the code : " if (requestParams.containsKey("clean")) { clean = StrUtils.parseBool( (String) requestParams.get("clean"), true); } else if (DataImporter.DELTA_IMPORT_CMD.equals(command) || DataImporter.IMPORT_CMD.equals(command)) { clean = false; } else { clean = debug ?

Adding applicative cache to SolrSearcher

2015-06-10 Thread adfel70
I am using RankQuery to implement my applicative scorer that returns a score based on the value of specific field (lets call it 'score_field') that is stored for every document. The RankQuery creates a collector, and for every collected docId I retrieve the value of score_field, calculate the scor

Re: How to tell when Collector finishes collect loop?

2015-06-10 Thread adfel70
I need to execute close() because the scorer is being opened in a context of a query and caches some data in that scope - of the specific query. The way to clear this cache, which is only relevant for that query, is to call close(). I think this API is not so good, but I assume that the scorer's co

Re: Date Format Conversion Function Query

2015-06-10 Thread Ali Nazemian
Thank you very much. It seems that document transformer is the perfect extension point for this conversion. I will try to implement that. Best regards. On Wed, Jun 10, 2015 at 3:54 PM, Upayavira wrote: > Another technology that might make more sense is a Doc Transformer. > > You also specify the

The best way to exclude "seen" results from search queries

2015-06-10 Thread amid
Hi, We have a solr index with ~1M documents. We want to give the ability to our users to filter results from queries - meaning they will not shown again for any query of this specific user (we currently have 10K users). You can think of a scenario like a "recommendation engine" which you don't wa

SolrCloud No Active Slice

2015-06-10 Thread James Webster
I'm having a config issue, I'm posting the error from Solrj which also includes the cluster state JSON: org.apache.solr.common.SolrException: No active slice servicing hash code 2ee4d125 in DocCollection(rfp365)={ "shards":{"shard1":{ "range":"-", "state":"active",

SolrCloud No Active Slice

2015-06-10 Thread James Webster
I'm having a config issue, I'm posting the error from Solrj which also includes the cluster state JSON: org.apache.solr.common.SolrException: No active slice servicing hash code 2ee4d125 in DocCollection(rfp365)={ "shards":{"shard1":{ "range":"-", "state":"active",

Re: Adding applicative cache to SolrSearcher

2015-06-10 Thread Mikhail Khludnev
Hello, The problem is SlowCompositeReaderWrapper.wrap(searcher.getIndexReader()); you hardly ever need to to this, at least because Solr already does it. DocValues need to be accessed per segment, leaf/atomic/reader/context provided to collector. eg look at DocTermsIndexDocValues.strVal(int) DocT

Re: The best way to exclude "seen" results from search queries

2015-06-10 Thread Mikhail Khludnev
start with negating and bypassing caches by https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser eg fq=-{!terms f=p_id cache=false}1,3,5,already,seen note: Elastic can even store such filters via https://www.elastic.co/guide/en/elasticsearch/reference/current

File paths in Zookeeper managed config files

2015-06-10 Thread Peter Scholze
Hi all, I'm using Zookeeper 3.4.6 in the context of SolrCloud 5. When uploading a config file containing the following, I get an "Invalid Path String" error. words="/netapp/dokubase/seeval/dicts/stopwords/stopwords_de.txt" ignoreCase="true"/> leads obviously to Invalid path string \"/con

Re: Adding applicative cache to SolrSearcher

2015-06-10 Thread Chris Hostetter
: : The problem is SlowCompositeReaderWrapper.wrap(searcher.getIndexReader()); : you hardly ever need to to this, at least because Solr already does it. Specifically you should just use... searcher.getLeafReader().getSortedSetDocValues(your_field_anme) ...instead of doing all this wrapp

Re: File paths in Zookeeper managed config files

2015-06-10 Thread Shawn Heisey
On 6/10/2015 2:47 PM, Peter Scholze wrote: > I'm using Zookeeper 3.4.6 in the context of SolrCloud 5. When > uploading a config file containing the following, I get an "Invalid > Path String" error. > > words="/netapp/dokubase/seeval/dicts/stopwords/stopwords_de.txt" > ignoreCase="true"/> > > lead

RE: The best way to exclude "seen" results from search queries

2015-06-10 Thread Reitzel, Charles
I don't see any way around storing which recommendations have been delivered to each user. Sounds like a separate collection with the unique ID created from the combination of the user ID and the recommendation ID (with the IDs also available as a separate, searchable and returnable fields).

Re: TZ & rounding

2015-06-10 Thread Chris Hostetter
: So my question is: can I get offset of time if I use NOW/MINUTE and not NOW/DAY rounding? i'm sorry, but your question is still too terse, vague, and ambiguious for me to really make much sense of it; and the example queries you provided really don't have enough context for me to understand

Re: Indexing issue - index get deleted

2015-06-10 Thread Chris Hostetter
: The guys was using delta import anyway, so maybe the problem is : different and not related to the clean. that's not what the logs say. Here's what i see... Log begins with server startup @ "Jun 10, 2015 11:14:56 AM" The DeletionPolicy for the "shopclue_prod" core is initialized at "Jun 10,

Show all fields in Solr highlighting output

2015-06-10 Thread Zheng Lin Edwin Yeo
Hi, Is it possible to list all the fields in the highlighting portion in the output? Currently,even when I *, it only shows fields where highlighting is possible, and fields which highlighting is not possible is not shown. I would like to have the output where all the fields, regardless if highli

Re: Index optimize runs in background.

2015-06-10 Thread Modassar Ather
Hi, There are 5 cores and a separate server for indexing on this solrcloud. Can you please share your suggestions on: How can indexer know that the optimize has completed even if the commit/optimize runs in background without going to the solr servers may be by using any solrj or other API? I t

Re: Index optimize runs in background.

2015-06-10 Thread Erick Erickson
If I knew, I would fix it ;). The sub-optimizes (i.e. the ones sent out to each replica) should be sent in parallel and then each thread should wait for completion from the replicas. There is no real "check for optimize", I believe that the return from the call is considered sufficient. If we can t

Re: Index optimize runs in background.

2015-06-10 Thread Walter Underwood
Why would you care when the forced merge (not an “optimize”) is done? Start it and get back to work. Or even better, never force merge and let the algorithm take care of it. Seriously, I’ve been giving this advice since before Lucene was written, because Ultraseek had the same approach for mana

AW: How to assign shard to specifc node?

2015-06-10 Thread MOIS Martin (MORPHO)
Thank you for your quick answer. The two parameters createNodeSet and createNodeSet.shuffle seem to solve the problem: http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&router.name=implicit&shards=shard1,shard2,shard3&router.field=shard&createNodeSet=node1,

Re: Index optimize runs in background.

2015-06-10 Thread Upayavira
Until somewhere around Lucene 3.5, you needed to optimise, because the merge strategy used wasn't that clever and left lots of deletes in your largest segment. Around that point, the TieredMergePolicy became the default. Because its algorithm is much more sophisticated, it took away the need to opt