New article on ZK "Poison Packet"

2015-05-08 Thread steve
While very technical and unusual, a very interesting view of the world of Linux and ZooKeeper Clusters... http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-packet/

Re: SolrCloud indexing

2015-05-08 Thread Vincenzo D'Amore
I have just added a comment to the CWiki. Thanks again for your prompt answer Erick. Best, Vincenzo On Fri, May 8, 2015 at 12:39 AM, Erick Erickson wrote: > bq: ...forwards the index notation to itself and any replicas... > > That's just odd phrasing. > > All that means is that the document sen

Re: indexing java byte code in classes / jars

2015-05-08 Thread Mark
Erik, Thanks for the pretty much OOTB approach. I think I'm going to just try a range of approaches, and see how far I get. The "IDE does this suggestion" would be worth looking into as well. On 8 May 2015 at 22:14, Mark wrote: > > https://searchcode.com/ > > looks really interesting, howe

Re: indexing java byte code in classes / jars

2015-05-08 Thread Mark
https://searchcode.com/ looks really interesting, however I want to crunch as much searchable aspects out of jars sititng on a classpath or under a project structure... Really early days so I'm open to any suggestions On 8 May 2015 at 22:09, Mark wrote: > To answer why bytecode - because mos

Re: indexing java byte code in classes / jars

2015-05-08 Thread Mark
To answer why bytecode - because mostly the use case I have is looking to index as much detail from jars/classes. extract class names, method names signatures packages / imports I am considering using ASM in order to generate an analysis view of the class The sort of usecases I have would be met

RE: indexing java byte code in classes / jars

2015-05-08 Thread Reitzel, Charles
There are a number of reverse compilers for Java. Some are quite good and very detailed, so long as the byte code has not been deliberately obfuscated. Of course the original sources would be better for picking up comments. But, then you'd need a java parser (the compiler front end), of wh

Re: indexing java byte code in classes / jars

2015-05-08 Thread Erik Hatcher
Oh, and sorry, I omitted a couple of details: # creating the “java” core/collection bin/solr create -c java # I ran this from my Solr source code checkout, so that SolrLogFormatter.class just happened to be handy Erik > On May 8, 2015, at 4:11 PM, Erik Hatcher wrote: > > What kin

Re: Fuzzy phrases + weighting at query level or do I need to program?

2015-05-08 Thread Tomasz Borek
Best I found so far is: +place:(+word1~ +word2~ +word3~) pozdrawiam, LAFK 2015-04-26 3:20 GMT+02:00 Tomasz Borek : > Ave! > > How do I make fuzzy search on lengthy names? As in "La Riviera Montana de > los Diablos" or "Unified Mega Corp Super Dwelling"? Across all queries? > > My query has 3 le

Re: indexing java byte code in classes / jars

2015-05-08 Thread Erik Hatcher
What kinds of searches do you want to run? Are you trying to extract class names, method names, and such and make those searchable? If that’s the case, you need some kind of “parser” to reverse engineer that information from .class and .jar files before feeding it to Solr, which would happen

Re: Solr Exception "The remote server returned an error: (400) Bad Request."

2015-05-08 Thread Tomasz Borek
Short answer: wget skips body on 400 assuming you didn't want error page stored. Long answer: get your error page with additional wget params, like so: ✗ wget -Sd http://10.0.3.113:8080/solr/collection1/vitas\?q\=coreD%3A25 DEBUG output created by Wget 1.15 on linux-gnu. URI encoding = `UTF-8' --

Re: How to handle special characters in fuzzy search query

2015-05-08 Thread Tomasz Borek
FWIW you may also want to drop the boolean ops in favour of + and - (OR being default) pozdrawiam, LAFK 2015-05-08 18:59 GMT+02:00 Erick Erickson : > Steven: > > They're listed on the ref guide I posted. Not a concise list, but > you'll see && || and other "interesting" bits. > > On Fri, May 8,

Re: indexing java byte code in classes / jars

2015-05-08 Thread Mike Drob
What do the various Java IDEs use for indexing classes for field/type/variable/method usage search? I imagine it's got to be bytecode. On Fri, May 8, 2015 at 2:40 PM, Tomasz Borek wrote: > Out of curiosity: why bytecode? > > pozdrawiam, > LAFK > > 2015-05-08 21:31 GMT+02:00 Mark : > > > I lookin

Re: indexing java byte code in classes / jars

2015-05-08 Thread Tomasz Borek
Out of curiosity: why bytecode? pozdrawiam, LAFK 2015-05-08 21:31 GMT+02:00 Mark : > I looking to use Solr search over the byte code in Classes and Jars. > > Does anyone know or have experience of Analyzers, Tokenizers, and Token > Filters for such a task? > > Regards > > Mark >

indexing java byte code in classes / jars

2015-05-08 Thread Mark
I looking to use Solr search over the byte code in Classes and Jars. Does anyone know or have experience of Analyzers, Tokenizers, and Token Filters for such a task? Regards Mark

Re: and stopword in user query is being change to q.op=AND

2015-05-08 Thread Rajesh Hazari
Thanks Show and Hoss. Just added lowercaseOperators=false to my edismax config and everything seems to be working. *Thanks,* *Rajesh,* *(mobile) : 8328789519.* On Mon, Apr 27, 2015 at 11:53 AM, Rajesh Hazari wrote: > I did go through the documentation of edismax (solr 5.1 documentation), > that

Re: Not able to Add docValues in Solr

2015-05-08 Thread pras.venkatesh
Never mind.. used the zkcli.sh that comes with solr to accomplish the firewall -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-Add-docValues-in-Solr-tp4204405p4204579.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to handle special characters in fuzzy search query

2015-05-08 Thread Erick Erickson
Steven: They're listed on the ref guide I posted. Not a concise list, but you'll see && || and other "interesting" bits. On Fri, May 8, 2015 at 9:20 AM, Steven White wrote: > Hi Erick, > > Is there a documented list of all operators (AND, OR, NOT, etc.) that also > need to be escaped? Are there

Re: determine "big" documents in the index?

2015-05-08 Thread Erick Erickson
Oops, this may be a better link: http://lucidworks.com/blog/indexing-with-solrj/ On Fri, May 8, 2015 at 9:55 AM, Erick Erickson wrote: > bq: has 30'860'099 terms. Is this "too much" > > Depends on how you indexed it. If you used shingles, then maybe, maybe > not. If you just do normal text analys

Re: determine "big" documents in the index?

2015-05-08 Thread Erick Erickson
bq: has 30'860'099 terms. Is this "too much" Depends on how you indexed it. If you used shingles, then maybe, maybe not. If you just do normal text analysis, it's suspicious to say the least. There are about 300K words in the English language and you have 100X that. So either 1> you have a lot of

Best way to backup and restore an index for a cloud setup in 4.6.1?

2015-05-08 Thread John Smith
All, With a cloud setup for a collection in 4.6.1, what is the most elegant way to backup and restore an index? We are specifically looking into the application of when doing a full reindex, with the idea of building an index on one set of servers, backing up the index, and then restoring that ba

Re: Limit the documents for each shard in solr cloud

2015-05-08 Thread Jilani Shaik
Hi, Actually we are facing lot of issues with Solr shards in our environment. Our environment is fully loaded with around 150 million documents where each document will have around 50+ stored fields which has multiple values. And also we have lot of custom components in this environment which are

Re: How to get the docs id after commit

2015-05-08 Thread Erick Erickson
Not that I know of. "newest doc id" is pretty ambiguous. If I transmit a batch of 100 docs then commit, they're all committed at once. Which one, then, is "newest"? And consider what happens if (in SolrCloud) mode, I send updates to two separate nodes. The docs are forwarded to the leader for the s

Re: How to handle special characters in fuzzy search query

2015-05-08 Thread Steven White
Hi Erick, Is there a documented list of all operators (AND, OR, NOT, etc.) that also need to be escaped? Are there more beside the 3 I listed? Thanks Steve On Fri, May 8, 2015 at 11:47 AM, Erick Erickson wrote: > Each of the characters you identified are characters that have meaning > to the

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
Thank you for your suggestions. I can't do a proper testing on that yet as I'm currently using a 4GB RAM normal PC machine, and all these probably requires more RAM that what I have. I've tried running the setup with 20 synonyms file, and the system went Out of Memory before I could test anything.

Re: Slow highlighting on Solr 5.0.0

2015-05-08 Thread Matt Hilt
I¹ve been looking into this again. The phrase highlighter is much slower than the default highlighter, so you might be able to add hl.usePhraseHighlighter=false to your query to make it faster. Note that web interface will NOT help here, because that param is true by default, and the checkbox is ba

Re: How to handle special characters in fuzzy search query

2015-05-08 Thread Erick Erickson
Each of the characters you identified are characters that have meaning to the query parser, '+' is a mandatory clause, '-' is a NOT operator and * is a wildcard. To get through the query parser, these (and a bunch of others, see below) must be escaped. Personally, though, I'd pre-scrub the data. D

SolrCloud 4.8.0 - Snapshots directory take a lot of space

2015-05-08 Thread Vincenzo D'Amore
Hi All, Looking at data directory in my solrcloud cluster I have found a lot of old snapshot directory in Like these: snapshot.20150506003702765 snapshot.20150506003702760 snapshot.20150507002849492 snapshot.20150507002849473 snapshot.20150507002849459 or even a month older. These directories ke

AW: determine "big" documents in the index?

2015-05-08 Thread Clemens Wyss DEV
On one of my fields (the "phrase suggestion" field) has 30'860'099 terms. Is this "too much"? Another field (the "single word suggestion") has 2'156'218 terms. -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Freitag, 8. Mai 2015 15:54 An: solr-us

Re: JSON Facet & Analytics API in Solr 5.1

2015-05-08 Thread Frank li
Hi Yonik, Any update for the question? Thanks in advance, Frank On Thu, May 7, 2015 at 2:49 PM, Frank li wrote: > Is there any book to read so I won't ask such dummy questions? Thanks. > > On Thu, May 7, 2015 at 2:32 PM, Frank li wrote: > >> This one does not have problem, but how do I inclu

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Alessandro Benedetti
This is a quite big Sinonym corpus ! If it's not feasible to have only 1 big synonym file ( I haven't checked, so I assume the 1 Mb limit is true, even if strange) I would do an experiment : 1) testing query time with a Solr Classic config 2) Use an Ad Hoc Solr Core to manage Synonyms ( in this way

determine "big" documents in the index?

2015-05-08 Thread Clemens Wyss DEV
Context: Solr/Lucene 5.1 Is there a way to determine documents that occupy alot "space" in the index. As I don't store any fields that have text, it must be the terms extracted from the documents occupying the space. So my question is: which documents occupy a most space in the inverted index?

Re: New core on Solr Cloud

2015-05-08 Thread shacky
Thank you very much Erick. Bye 2015-05-06 17:06 GMT+02:00 Erick Erickson : > That should have put one replica on each machine, if it did you're fine. > > Best, > Erick > > On Wed, May 6, 2015 at 3:58 AM, shacky wrote: >> Ok, I found out that the creation of new core/collection on Solr 5.1 >> is m

Re: solr.war built from solr 4.7.2 not working

2015-05-08 Thread Shawn Heisey
On 5/7/2015 11:52 PM, Rahul Singh wrote: > ERROR - 2015-05-08 11:15:25.738; org.apache.solr.common.SolrException; > null:java.lang.IllegalArgumentException: You cannot set an index-time bo > ost on an unindexed field, or one that omits norms This seems to be the problem. You are trying to set an

Re: ZooKeeperException: Could not find configName for collection

2015-05-08 Thread shacky
Thank you Erick for your answer! I just tried to restart the first node and now the error is not yet there! Sorry for my too-early email :-) Bye! 2015-05-06 17:05 GMT+02:00 Erick Erickson : > Have you looked arond at your directories on disk? I'm _not_ talking > about the admin UI here. The defaul

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
So it means like having more than 10 or 20 synonym files locally will still be faster than accessing external service? As I found out that zookeeper only allows the synonym.txt file to be a maximum of 1MB, and as my potential synonym file is more than 20MB, I'll need to split the file to more than

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Alessandro Benedetti
Accessing an external service ( such a thesaurus website) per each query, can slow down your system a lot. Having the synonyms locally, with the Solr integration is much better. Cheers 2015-05-08 11:46 GMT+01:00 Zheng Lin Edwin Yeo : > The document seems to point to using AutoPhrasingTokenFilter

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
The document seems to point to using AutoPhrasingTokenFilter, putting an underscore to the multi-term or changing to index time synonyms. I'm also thinking of putting the synonyms onto a database or query some thesaurus website when the using enter the search key, instead of using the SynonymFilte

Re: Solr 5.1.0 Cloud and Zookeeper

2015-05-08 Thread Christos Manios
Hello Shacky, I have recently performed a manual installation of a Zookeeper ensemble (3 zookeepers) in the same machine. I used the upstart init script from official .deb configuration and modified it in order to

Re: Solr Multilingual Indexing with one field- Guidance

2015-05-08 Thread Alessandro Benedetti
Is it possible to know a little bit more about the nature of that multi-lingual field ? I can see the keywordTokenizer and then a lot of grams calculated from that token . What is that field used for ? 2015-05-07 19:23 GMT+01:00 Kuntal Ganguly : > Our current production index size is 1.5 TB with

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Alessandro Benedetti
I found this very interesting article that I think can help in better understanding the problem : http://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ And this : http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so

Re: Proximity searching in percentage

2015-05-08 Thread Zheng Lin Edwin Yeo
Hi Alessandro, Thank you so much for the info. Will try that out. Regards, Edwin On 8 May 2015 17:27, "Alessandro Benedetti" wrote: > 2015-05-08 10:14 GMT+01:00 Zheng Lin Edwin Yeo : > > > Hi Alessandro, > > > > I'm using Solr 5.0.0, but it is still able to work. Actually I found this > > to be

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
Thanks for explaining the information. Currently I'm only using the comma-separated list of words and only using the synonym filter at query time. I find that when I set expend = true, there's quite a number of irrelevant results that came back, and this didn't happen when I set expend = false. I

Re: Proximity searching in percentage

2015-05-08 Thread Alessandro Benedetti
2015-05-08 10:14 GMT+01:00 Zheng Lin Edwin Yeo : > Hi Alessandro, > > I'm using Solr 5.0.0, but it is still able to work. Actually I found this > to be better than ~1 or ~2, as it can automatically detect > and allow the 20% error rate that I want. > I don't think that the "double" param is suppor

Re: Proximity searching in percentage

2015-05-08 Thread Zheng Lin Edwin Yeo
Hi Alessandro, I'm using Solr 5.0.0, but it is still able to work. Actually I found this to be better than ~1 or ~2, as it can automatically detect and allow the 20% error rate that I want. For this ~1 or ~2, does it mean that I'll have to manually detect how many characters did I enter, before I

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Alessandro Benedetti
Let's explain little bit better here : First of all, the SynonimFilter is a Token Filter, and being a Token Filter it can be part of an Analysis pipeline at Indexing and Query Time. As the different type of analysis explicitly explains when the filtering happens, let's go to the details of the syn

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
Just an update, the tokenizer class which I'm using is StandardTokenizerFactory, and I'm using Solr 5.0. On 8 May 2015 16:24, "Zheng Lin Edwin Yeo" wrote: > Hi, > > Will like to check, for the SynonymFilterFactory, I have the following in > my synonyms.txt: > > Titanium Dioxides, titanium oxide,

Re: Proximity searching in percentage

2015-05-08 Thread Alessandro Benedetti
Hi Zheng, actually that version of the fuzzy search is deprecated! Currently the fuzzy search syntax is : ~1 or ~2 The ~(tilde) param is the number of edit we provide to generate all the expanded query to run. Can I ask you which version of Solr are you using ? This article from 2011 shows the bi

Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
Hi, Will like to check, for the SynonymFilterFactory, I have the following in my synonyms.txt: Titanium Dioxides, titanium oxide, pigment pigment, colour, colouring material If I set expend=false, and I search for q=pigment, I will get results that matches pigment, Titanium Dioxides and titanium

How to get the docs id after commit

2015-05-08 Thread 李文
Hi, Solr Developers I want to get the newest commited docs in the postcommit event, then nofity the other server which data can be used, but I can not find any way to get the newest docs after commited, so is there any way to do this? Thank you. Wen Li