Re: solrcloud "indexing completed" event

2014-07-01 Thread Giovanni Bricconi
Thank you Erick, Fortunately I can modify the data feeding process to start my post-indexing tasks. 2014-06-30 22:13 GMT+02:00 Erick Erickson : > The paradigm is different. In SolrCloud when a client sends an indexing > request to any node in the system, when the response comes back all the

Re: why full-import not work well?

2014-07-01 Thread rulinma
done. There is a bug, remove something is ok. -- View this message in context: http://lucene.472066.n3.nabble.com/why-full-import-not-work-well-tp4142193p4144932.html Sent from the Solr - User mailing list archive at Nabble.com.

Disable all caches in Solr

2014-07-01 Thread vidit.asthana
I want to run some query benchmarks, so I want to disable all type of caches in solr. I commented out filterCache, queryResultCache and documentCache in solrConfig.xml. I don't care about Result Window Size cause numdocs is 10 in all the cases. Are there any other hidden caches which I should know

Re: Disable all caches in Solr

2014-07-01 Thread Alexandre Rafalovitch
Have you also disabled the queries used to initialize searchers after commit? Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 1, 2014 at 3:53 PM, vidit.asthana wrote: > I want to run some query benchmark

Re: Disable all caches in Solr

2014-07-01 Thread vidit.asthana
Yes, I have also commented "newSearcher" and "firstSearcher" queries in solrConfig.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-all-caches-in-Solr-tp4144933p4144935.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread benjelloun
Hello, here is my configuration which don't work: shema: config: explicit velocity browse layout Solritas edismax *_ar^2 *_fr^3 *_en^2.2 AllChamp 100% *:* 10 *,

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread Alexandre Rafalovitch
I believe, you were already answered. If you want to have text parsed/analyzed in different ways, you need to have them in separate fields with separate analyzer stacks. Then use disMax/eDisMax to search across those fields. copyField copies the original content and therefore when you search the

Restriction on type of uniqueKey field?

2014-07-01 Thread Alexandre Rafalovitch
Hello, I remember reading somewhere that id field (uniqueKey) must be String. But I cannot find the definitive confirmation, just that it should be non-analyzed. Can I use a single-valued TrieLongField type, with precision set to 0? Or am I going to hit issues? Regards, Alex. Personal website

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread benjelloun
Hello, i have 300 feilds which are copied on "AllChamp" if i want to do separated fields then i need to create 300 * Number of languages i have, which is not logical for me. is there any other solution? Best regards Anass BENJELLOUN 2014-07-01 11:28 GMT+02:00 Alexandre Rafalovitch [via Lucene]

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread Alexandre Rafalovitch
But aren't you already creating those 300 fields anyway: If you mean you have issues specifying them in eDisMax, I believe 'qf' parameter allows to specify a wildcard. Alternatively, you can look at the example used in Solr In Action book: https://github.com/treygrainger/solr-in-action/tree/mas

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread benjelloun
i have documents (ar, en , fr) i need to index them and keeping analyzer and filter for each languages. here is all fields on schema to enderstand my probleme:

Re: Disable all caches in Solr

2014-07-01 Thread Toke Eskildsen
On Tue, 2014-07-01 at 10:53 +0200, vidit.asthana wrote: > Are there any other hidden caches which I should know about before running > my tests? Clear the disk cache? - Toke Eskildsen, State and University Library, Denmark

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Shalin Shekhar Mangar
No, you definitely can have an int or long uniqueKey. A lot of Solr's tests use such a uniqueKey. See solr/core/src/test-files/solr/collection1/conf/schema.xml On Tue, Jul 1, 2014 at 3:20 PM, Alexandre Rafalovitch wrote: > Hello, > > I remember reading somewhere that id field (uniqueKey) must b

Out of Memory when i downdload 5 Million records from sqlserver to solr

2014-07-01 Thread mskeerthi
I have to download my 5 million records from sqlserver to solr into one index. I am getting below exception after downloading 1 Million records. Is there any configuration or another to download from sqlserver to solr. Below is the exception i am getting in solr: org.apache.solr.common.SolrExcepti

Sharing single indexer for 2 different solr instance

2014-07-01 Thread deepakinniah
Hi, I have a solr indexer in my network path and i want to share this indexer(Without replication) for more than one solr instance. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Sharing-single-indexer-for-2-different-solr-instance-tp4144954.html Sent from the

Re: solr dedup on specific fields

2014-07-01 Thread Ali Nazemian
Any suggestion would be appreciated. Regards. On Mon, Jun 30, 2014 at 2:49 PM, Ali Nazemian wrote: > Hi, > I used solr 4.8 for indexing the web pages that come from nutch. I know > that solr deduplication operation works on uniquekey field. So I set that > to URL field. Everything is OK. except

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread benjelloun
and i use dynamicfields for NomDocument,ContenuDocument,Postit exemple: ContenuDocument_fr, ContenuDocument_en,ContenuDocument_ar NomDocument,ContenuDocument,Postit language_s fr en,fr,ar true is there any other solution to not separate fileds? Best regards A

Re: Garbage collection issue and RELOADing cores

2014-07-01 Thread François Schiettecatte
Hi Just following up on my previous post about a memory leak when RELOADing cores, I narrowed it down to the SuggestComponent, specifically '...' in solrconfig.xml. Comment that out and the leak goes away. The leak occurs in 4.7, 4.8 and 4.9. It occurs when a core is RELOADed, but not if it is

AUTO: Saravanan Chinnadurai is out of the office (returning 02/07/2014)

2014-07-01 Thread Saravanan . Chinnadurai
I will be out of the office starting 01/07/2014 and will not return until 02/07/2014 Please email itsta...@actionimages.com for any urgent queries. Note: This is an automated response to your message "Strategy for removing an active shard from zookeeper" sent on 7/1/2014 0:45:59. This is the

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Michael Della Bitta
Alex, maybe you're thinking of constraints put on shard keys? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions | g+: plus.google.com/appinion

Language detection for solr 3.6.1

2014-07-01 Thread Poornima Jay
Hi, Can anyone please let me know how to integrate http://code.google.com/p/language-detection/ in solr 3.6.1. I want four languages (English, chinese simplified, chinese traditional, Japanes, and Korean) to be added in one schema ie. multilingual search from single schema file. I tried added

RE: Multiterm analysis in complexphrase query

2014-07-01 Thread Michael Ryan
Thanks. This looks interesting... -Michael -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Monday, June 30, 2014 8:15 AM To: solr-user@lucene.apache.org Subject: RE: Multiterm analysis in complexphrase query Ahmet, please correct me if I'm wrong, but the C

Best way to fix "Document contains at least one immense term"?

2014-07-01 Thread Michael Ryan
In LUCENE-5472, Lucene was changed to throw an error if a term is too long, rather than just logging a message. I have fields with terms that are too long, but I don't care - I just want to ignore them and move on. The recommended solution in the docs is to use LengthFilterFactory, but this lim

Re: solr dedup on specific fields

2014-07-01 Thread Alexandre Rafalovitch
Well, it's implemented in SignatureUpdateProcessorFactory. Worst case, you can clone that code and add your preserve-field functionality. Could even be a nice contribution. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Alexandre Rafalovitch
I wasn't thinking of shard keys, but may have been confused in the reading. Thank you everyone, the long key is working just fine for me. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Ju

Re: Integrating solr with Hadoop

2014-07-01 Thread Erick Erickson
Should be fine. Things to watch: 1> solrconfig.xml has to have the HdfsDirectoryFactory enabled. 2> You probably want to configure ZooKeeper stand-alone, although it's possible to run embedded ZK it's just awkward since you can't really bounce Solr nodes running embedded ZK at wil

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Koji Sekiguchi
In addition, KeywordTokenizer can be seemingly used but it should be avoided for unique key field. One of my customers that used it and they had got OOM during a long term indexing. As it was difficult to find the problem, I'd like to share my experience. Koji -- http://soleami.com/blog/comparing

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread Erick Erickson
OK, back up a bit and consider alternative indexing schemes. For instance, do you really need all those fields? Could you get away with one field where you indexed the field _name_ + associated value? (you'd have to be very careful with your analysis chain, but...) Something like: C67_val_value1 a

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Erick Erickson
non-String fields have historically popped out in weird places. I think at one point, for instance, QueryElevationComponent barfed on non-string types. So, there may still be edge cases in which this can be a problem. IMO, they're all bugs though. Erick On Tue, Jul 1, 2014 at 7:43 AM, Koji Seki

Throwing Error "Missing Mandatory uniquekey field id"

2014-07-01 Thread mskeerthi
I mentioned id as string in schema.xml and i copied the csv into example docs folder. I used the below commaand to download the data " Java -Dtype=application/csv -jar post.jar import.csv" it's throwing the below error.Please help in this regard. ERROR - 2014-07-01 19:57:43.902; org.apache.solr.

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread benjelloun
hello erick, unfortunately i can't modify the schema , me and my team analyzed carefully the problem, so all fields you seeing are required on schema. now i just tested to do different fields maybe it could work if i knew syntaxe of edismax: and on config this is SearchHandler but i

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread Daniel Collins
Ok, firstly to say you need to fix your problem but you can't modify the schema, doesn't really help. If the schema is setup badly, then no amount of help at search time will ever get you the results you want... Secondly, from what I can see in the schema, there is no AllChamp_fr, AllChamp_en, et

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread benjelloun
Hello, for Cx_val, there is some fields which are multivalued :) for AllChamp_fr, AllChamp_en..., i juste added them to the schema to test if edismax work. 2014-07-01 17:13 GMT+02:00 Daniel Collins [via Lucene] < ml-node+s472066n4145024...@n3.nabble.com>: > Ok, firstly to say you need to fi

Re: Out of Memory when i downdload 5 Million records from sqlserver to solr

2014-07-01 Thread Aman Tandon
You can try gave some more memory to solr On Jul 1, 2014 4:41 PM, "mskeerthi" wrote: > I have to download my 5 million records from sqlserver to solr into one > index. I am getting below exception after downloading 1 Million records. Is > there any configuration or another to download from sqlser

RE: Multiterm analysis in complexphrase query

2014-07-01 Thread Allison, Timothy B.
If there's enough interest, I might get back into the code and throw a standalone src (and jar) of the SpanQueryParser and the Solr wrapper onto github. That would make it more widely available until there's a chance to integrate it into Lucene/Solr. If you'd be interested in this, let me know

Re: Out of Memory when i downdload 5 Million records from sqlserver to solr

2014-07-01 Thread IJ
We faced similar problems on our side. We found it more reliable to have a mechanism to extract all data from the Database into a flat file - and then use a JAVA program to bulk index into Solr from the file via SolrJ API. -- View this message in context: http://lucene.472066.n3.nabble.com/Out-

Does Solr move documents between shards when the value of the shard key is updated ?

2014-07-01 Thread IJ
Lets say I create a Solr Collection with multiple shards (say 2 shards) and set the value of "router.field" to a field called "CompanyName". Now - we all know that during Indexing Solr would compute a hash on the value indexed into the "CompanyName" and route to an appropriate shard. Lets say I in

Re: Throwing Error "Missing Mandatory uniquekey field id"

2014-07-01 Thread Chris Hostetter
: I mentioned id as string in schema.xml and i copied the csv into example docs : folder. I used the below commaand to download the data " Java : -Dtype=application/csv -jar post.jar import.csv" : : it's throwing the below error.Please help in this regard. : : ERROR - 2014-07-01 19:57:43.902; o

Re: Disable all caches in Solr

2014-07-01 Thread Chris Hostetter
: I want to run some query benchmarks, so I want to disable all type of caches Just to be clear: disabling all internal caching because you want to run a benchmark means you're probably going to wind up running a useless benchmark. Solr's internal caching is a key component of it's perormance

Confusion about location of + and - ?

2014-07-01 Thread Brett Hoerner
Can anyone explain the difference between these two queries? text:(+"happy") AND -user:("123456789") = numFound 2912224 But text:(+"happy") AND user:(-"123456789") = numFound 0 Now, you may just say "then just put - infront of your field, duh!" Well, text:(+"happy") = numFound 2912224

MLT weird behaviour in Solrcloud

2014-07-01 Thread Shamik Bandopadhyay
Hi, I'm trying to use "mlt" request handler in a Solrcloud cluster. Apparently, its showing some weird behavior. I'm getting response randomly, it's able to return results randomly for the same query. I'm using Solrj client which in turn communicates the cluster using zookeeper ensemble. Here's

Re: Confusion about location of + and - ?

2014-07-01 Thread Jack Krupansky
Yeah, there's a known bug that a negative-only query within parentheses doesn't match properly - you need to add a non-negative term, such as "*:*". For example: text:(+"happy") AND user:(*:* -"123456789") -- Jack Krupansky -Original Message- From: Brett Hoerner Sent: Tuesday, July

Re: Confusion about location of + and - ?

2014-07-01 Thread Brett Hoerner
Interesting, is there a performance impact to sending the *:*? On Tue, Jul 1, 2014 at 2:53 PM, Jack Krupansky wrote: > Yeah, there's a known bug that a negative-only query within parentheses > doesn't match properly - you need to add a non-negative term, such as > "*:*". For example: > > text:

Re: Confusion about location of + and - ?

2014-07-01 Thread Brett Hoerner
Also, does anyone have the Solr or Lucene bug # for this? On Tue, Jul 1, 2014 at 3:06 PM, Brett Hoerner wrote: > Interesting, is there a performance impact to sending the *:*? > > > On Tue, Jul 1, 2014 at 2:53 PM, Jack Krupansky > wrote: > >> Yeah, there's a known bug that a negative-only quer

Re: Confusion about location of + and - ?

2014-07-01 Thread Jack Krupansky
No, that's what Solr would do if the bug were fixed. Matching all documents (*:*) is a "constant score query", so it takes no significant amount of resources. Personally, I consider this a bug in Lucene, but try convincing them of that! The issue was filed as: SOLR-3744 - "Solr LuceneQParser o

Continue indexing doc after error

2014-07-01 Thread tedsolr
I need to index documents from a csv file that will have 1000s of rows and 100+ columns. To help the user loading the file I must return useful errors when indexing fails (schema violations). I'm using SolrJ to read the files line by line, build the document, and index/commit. This approach allows

Re: Continue indexing doc after error

2014-07-01 Thread Tomás Fernández Löbbe
I think what you want is what’s described in https://issues.apache.org/jira/browse/SOLR-445 This has not been committed because it still doesn’t work with SolrCloud. Hoss gave me the hint to look at DistributingUpdateProcessorFactory to solve the problem described in the last comments, but I haven’

Re: Continue indexing doc after error

2014-07-01 Thread tedsolr
Thank you. That's a useful link. Maybe not quite what I'm looking for, as it appears to do with bulk loads of docs - returning an error for each bad doc. My question is more about getting all the errors for a single doc. I'm probably taking a performance hit by adding docs one at a time. I haven't

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Jack Krupansky
My vague recollection is that at least at one time there was a limitation somewhere in SolrCloud, but whether that is still true, I don't know. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Tuesday, July 1, 2014 9:48 AM To: solr-user@lucene.apache.org Subject:

Re: Best way to fix "Document contains at least one immense term"?

2014-07-01 Thread Jack Krupansky
You could develop an update processor to skip or trim long terms as you see fit. You can even code a script in JavaScruipt using the stateless script update processor. Can you tell us more about the nature of your data? I mean, sometimes analyzer filters strip or fold accented characters anywa

Re: Does Solr move documents between shards when the value of the shard key is updated ?

2014-07-01 Thread Erick Erickson
You would end up with duplicate docs on the two shards. Solr is doing its doc-id lookup on the shards, not on other shards. Routing takes place before this step, so you're going to have two docs. Best, Erick On Tue, Jul 1, 2014 at 9:42 AM, IJ wrote: > Lets say I create a Solr Collection with mu

RE: Best way to fix "Document contains at least one immense term"?

2014-07-01 Thread Michael Ryan
In this particular case, the fields are just using KeywordTokenizerFactory. I have other fields that are tokenized, but they use tokenizers with a short maxTokenLength. I'm not even all that concerned about my own data, but more curious if there's a general solution to this problem. I imagine t

Understanding fieldNorm differences between 3.6.1 and 4.9 solrs

2014-07-01 Thread Aaron Daubman
In trying to determine some subtle scoring differences (causing occasionally significant ordering differences) among search results, I wrote a parser to normalize debug.explain.structured JSON output. It appears that every score that is different comes down to a difference in fieldNorm, where the

Re: MLT weird behaviour in Solrcloud

2014-07-01 Thread Pramod Negi
why there is no comma(,) in between textlanguage in title,textlanguage,caaskey On Wed, Jul 2, 2014 at 12:42 AM, Shamik Bandopadhyay wrote: > Hi, > > I'm trying to use "mlt" request handler in a Solrcloud cluster. > Apparently, its showing some weird behavior. I'm getting response randomly, >

Re: How to integrate nlp in solr

2014-07-01 Thread Aman Tandon
Any help here With Regards Aman Tandon On Mon, Jun 30, 2014 at 11:00 PM, Aman Tandon wrote: > Hi Alex, > > I was try to get knowledge from these tutorials > http://www.slideshare.net/teofili/natural-language-search-in-solr & > https://wiki.apache.org/solr/OpenNLP: this one is kinda bit explain

Re: How to integrate nlp in solr

2014-07-01 Thread Alexandre Rafalovitch
Not from me, no. I don't have any real examples for this ready. I suspect the path beyond the basics is VERY dependent on your data and your business requirements. I would start from thinking how would YOU (as a human) do that match. Where does the 'blue' and 'color' and 'college' and 'bags' come

Memory Leaks in solr 4.8.1

2014-07-01 Thread Aman Tandon
Hi, When i am shutting down the solr i am gettng the Memory Leaks error in logs. Jul 02, 2014 10:49:10 AM org.apache.catalina.loader.WebappClassLoader > checkThreadLocalMapForLeaks > SEVERE: The web application [/solr] created a ThreadLocal with key of type > [org.apache.solr.schema.DateField.Thr

Re: How to integrate nlp in solr

2014-07-01 Thread Aman Tandon
Hi Alex, Thanks alex, one more thing i want to ask that so do we need to add the extra fields for those entities, e.g. "Item" (bags), "color" (blue), etc. If some how i managed to implement this nlp then i will definitely publish it on my blog :) With Regards Aman Tandon On Wed, Jul 2, 2014 at

Re: MLT weird behaviour in Solrcloud

2014-07-01 Thread shamik
Sorry, that's a typo when I copied the mlt definition from my solrconfig, but there's comma in my test environment. It's not the issue. -- View this message in context: http://lucene.472066.n3.nabble.com/MLT-weird-behaviour-in-Solrcloud-tp4145066p4145145.html Sent from the Solr - User mailing l

Is term~ effect available as a eDisMax param or a TokenFilter?

2014-07-01 Thread Alexandre Rafalovitch
Hello, I am trying to match the names. In UI, I can do it by doing name~ or name~2, but I can't expect users to do that and I don't want to do pre-tokenization in the middleware to inject that. Also, only specific fields are names, people can also enter phone numbers, which I don't want to fuzz wh