Re: Issue with large html indexing

2013-10-24 Thread Raheel Hasan
ok. see this: http://s23.postimg.org/yck2s5k1n/html_indexing.png On Wed, Oct 23, 2013 at 10:45 PM, Erick Erickson wrote: > Attachments and images are often eaten by the mail server, your image is > not visible at least to me. Can you describe what you're seeing? Or post > the image somewhere an

Re: Minor bug with CloudSolrServer and collection-alias.

2013-10-24 Thread Thomas Egense
Thanks to both of you for fixing the bug. Impressive response time for the fix (7 hours). Thomas Egense On Wed, Oct 23, 2013 at 7:16 PM, Mark Miller wrote: > I filed https://issues.apache.org/jira/browse/SOLR-5380 and just > committed a fix. > > - Mark > > On Oct 23, 2013, at 11:15 AM, Shawn H

RE: New shard leaders or existing shard replicas depends on zookeeper?

2013-10-24 Thread Hoggarth, Gil
Absolutely, the scenario I'm seeing does _sound_ like I've not specified the number of shards, but I think I have - the evidence is: - DnumShards=24 defined within the /etc/sysconfig/solrnode* files - DnumShards=24 seen on each 'ps' line (two nodes listed here): " tomcat 26135 1 5 09:51 ?

Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina
Dear All, Ok I have an answer concerning the first question (limit) It's the terms.limit parameters. But I can't find how to apply a Terms request on a query result any idea ? Bruno Le 23/10/2013 23:19, Bruno Mannina a écrit : Dear Solr users, I use the Terms function to see the frequen

SolrCloud: optimizing a core triggers optimizations of all cores in that collection?

2013-10-24 Thread michael.boom
Hi! I have a SolrCloud setup, on two servers 3 shards, replicationFactor=2. Today I trigered the optimization on core *shard2_replica2* which only contained 3M docs, and 2.7G. The size of the other shards were shard3=2.7G and shard1=48G (the routing is implicit but after some update deadlocks and

Re: Spellcheck with Distributed Search (sharding).

2013-10-24 Thread Luis Cappa Banda
Any idea? 2013/10/23 Luis Cappa Banda > More info: > > When executing the Query to a single Solr server it works: > http://solr1:8080/events/data/suggest?q=m&wt=json > > { > >- responseHeader: >{ > - status: 0,

Proposal for new feature, cold replicas, brainstorming

2013-10-24 Thread yriveiro
I'm wondering some time ago if it's possible have replicas of a shard synchronized but in an state that they can't accept queries only updates. This replica in "replication" mode only awake to accept queries if it's the last alive replica and goes to replication mode when other replica becomes al

Query & result caching with custom functions

2013-10-24 Thread Mathias Lux
Hi all! Got a question on the Solr cache :) I've written a custom function, which is able to provide a distance based on some DocValues to re-sort result lists. This basically works great, but we've got the problem that if I don't change the query, but the function parameters, Solr delivers a cac

Solr subset searching in 100-million document index

2013-10-24 Thread Sandeep Gupta
Hi, We have a Solr index of around 100 million documents with each document being given a region id growing at a rate of about 10 million documents per month - the average document size being aronud 10KB of pure text. The total number of region ids are themselves in the range of 2.5 million. We w

Re: Terms function join with a Select function ?

2013-10-24 Thread Erik Hatcher
That would be called faceting :) http://wiki.apache.org/solr/SimpleFacetParameters On Oct 24, 2013, at 5:23 AM, Bruno Mannina wrote: > Dear All, > > Ok I have an answer concerning the first question (limit) > It's the terms.limit parameters. > > But I can't find how to apply a Terms re

Basic query process question with fl=id

2013-10-24 Thread Manuel Le Normand
Hi Any distributed lookup is basically composed of two stages: the first collecting all the matching documents from every shard and a second which fetches additional information about specific ids (i.e stored, termVectors). It can be seen in the logs of each shard (isShard=true), where first requ

RE: Spellcheck with Distributed Search (sharding).

2013-10-24 Thread Dyer, James
Is it that your request handler is named "/suggest" but you are setting "shards.qt" to "/suggestion" ? James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Luis Cappa Banda [mailto:luisca...@gmail.com] Sent: Thursday, October 24, 2013 6:22 AM To: solr-user@lucene.apa

Re: Proposal for new feature, cold replicas, brainstorming

2013-10-24 Thread Toke Eskildsen
On Thu, 2013-10-24 at 13:27 +0200, yriveiro wrote: > The motivation of this is simple, I want have replication but I don't want > have n replicas actives with full resources allocated (cache and so on). > This is usefull in enviroments where replication is needed but a high query > throughput is no

Searching on special characters

2013-10-24 Thread johnmunir
Hi, How should I setup Solr so I can search and get hit on special characters such as: + - && || ! ( ) { } [ ] ^ " ~ * ? : \ My need is, if a user has text like so: Doc-#1: "(Solr)" Doc-#2: "Solr" And they type "(solr)" I want a hit on "(solr)" only in document #1, with the brackets match

Re: Searching on special characters

2013-10-24 Thread Jack Krupansky
Have two or three copies of the text, one field could be raw string and boosted heavily for exact match, a second could be text using the keyword tokenizer but with lowercase filter also heavily boosted, and the third field general, tokenized text with a lower boost. You could also have a copy

Re: Spellcheck with Distributed Search (sharding).

2013-10-24 Thread Luis Cappa Banda
I'ts just a type error, sorry about that! The Request Handler is OK spelled and it doesn't work. 2013/10/24 Dyer, James > Is it that your request handler is named "/suggest" but you are setting > "shards.qt" to "/suggestion" ? > > James Dyer > Ingram Content Group > (615) 213-4311 > > > -Or

Re: Searching on special characters

2013-10-24 Thread johnmunir
I'm not sure what you mean. Based on what you are saying, is there an example of how I can setup my schema.xml to get the result I need? Also, the way I execute a search is using http://localhost:8080/solr/select/?q= Does your solution require me to change this? If so, in what way? It wou

Re: Issue with large html indexing

2013-10-24 Thread Shawn Heisey
On 10/24/2013 2:11 AM, Raheel Hasan wrote: > ok. see this: > http://s23.postimg.org/yck2s5k1n/html_indexing.png A recap. You said your index analysis chain is this: HTMLStripCharFilterFactory WhitespaceTokenizerFactory (create tokens) StopFilterFactory WordDelimiterFilterFactory ICUFoldingFilter

Re: Query & result caching with custom functions

2013-10-24 Thread Shawn Heisey
On 10/24/2013 5:35 AM, Mathias Lux wrote: > I've written a custom function, which is able to provide a distance > based on some DocValues to re-sort result lists. This basically works > great, but we've got the problem that if I don't change the query, but > the function parameters, Solr delivers a

Re: Solr subset searching in 100-million document index

2013-10-24 Thread Joel Bernstein
Sandeep, This type of operation can often be expressed as a PostFilter very efficiently. This is particularly true if the region id's are integer keys. Joel On Thu, Oct 24, 2013 at 7:46 AM, Sandeep Gupta wrote: > Hi, > > We have a Solr index of around 100 million documents with each document >

Re: Query & result caching with custom functions

2013-10-24 Thread Joel Bernstein
Mathias, I'd have to do a close review of the function sort code to be sure, but I suspect if you implement the equals() method on the ValueSource it should solve your caching issue. Also implement hashCode(). Joel On Thu, Oct 24, 2013 at 10:35 AM, Shawn Heisey wrote: > On 10/24/2013 5:35 AM,

Re: Solr not indexing everything from MongoDB

2013-10-24 Thread Michael Della Bitta
That's typical for an index that receives updates to the same document. Are you sure your keys are unique? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appin

RE: New shard leaders or existing shard replicas depends on zookeeper?

2013-10-24 Thread Hoggarth, Gil
I think my question is easier, because I think the problem below was caused by the very first startup of the 'ldwa01' collection/'ldwa01cfg' zk collection name didn't specify the number of shards (and thus defaulted to 1). So, how can I change the number of shards for an existing collection/zk col

Re: New shard leaders or existing shard replicas depends on zookeeper?

2013-10-24 Thread Daniel Collins
Ah yes, I was about to mention that, -DnumShards is only actually used when the collection is being created for the first time. After that point (i.e. once the collection exists in ZK), passing it along the command line is redundant (Solr won't actually read it). I know preferred mechanism of cre

Re: Query & result caching with custom functions

2013-10-24 Thread Mathias Lux
That's a possibility, I'll try that and report on the effects. Thanks, Mathias Am 24.10.2013 16:52 schrieb "Joel Bernstein" : > Mathias, > > I'd have to do a close review of the function sort code to be sure, but I > suspect if you implement the equals() method on the ValueSource it should > sol

Re: Proposal for new feature, cold replicas, brainstorming

2013-10-24 Thread Yago Riveiro
With a shard with "listening" status and some logic on the mechanism that does the load balancing between replicas, we can achieve the goal. The SPLITSHARD action makes replicas from the original shard which are in "inactive" state, this shards buffering the updates and when the operation ends,

[ANNOUNCE] Apache Solr 4.5.1 released.

2013-10-24 Thread Mark Miller
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 October 2013, Apache Solr™ 4.5.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.5.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful

Re: [ANNOUNCE] Apache Solr 4.5.1 released.

2013-10-24 Thread Jack Park
Download redirects to 4.5.0 Is there a typo in the server path? On Thu, Oct 24, 2013 at 9:14 AM, Mark Miller wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > October 2013, Apache Solr™ 4.5.1 available > > The Lucene PMC is pleased to announce the release of Apache Solr 4.5.1 > > Solr

Re: [ANNOUNCE] Apache Solr 4.5.1 released.

2013-10-24 Thread Jack Park
Use a different server than default gets 4.5.1 On Thu, Oct 24, 2013 at 9:35 AM, Jack Park wrote: > Download redirects to 4.5.0 > Is there a typo in the server path? > > On Thu, Oct 24, 2013 at 9:14 AM, Mark Miller wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> October 2013, Apa

Re: Changing indexed property on a field from false to true

2013-10-24 Thread Aloke Ghoshal
Upayavira - Nice idea pushing in a nominal update when all fields are stored, and it does work. The nominal update could be sent to a boolean type dynamic field, that's not to be used for anything other than maybe identifying documents that are done re-indexing. On Wed, Oct 23, 2013 at 7:47 PM,

Re: New query-time multi-word synonym expander

2013-10-24 Thread Otis Gospodnetic
Jack - watch https://issues.apache.org/jira/browse/SOLR-5379 - comments from the author are there. Markus - ah, yes. I see I even managed to (re)name SOLR-5379 *exactly* the same as SOLR-4381 :) But the author of SOLR-5379 points out its advantages over SOLR-4381. Would be great if people could

Re: Changing indexed property on a field from false to true

2013-10-24 Thread Upayavira
When this gets interesting is if we had batch atomic updates. Imagine you could do indexCount++ fro all docs matching the query category:sport. Could be really useful. /dreaming. Upayavira On Thu, Oct 24, 2013, at 05:40 PM, Aloke Ghoshal wrote: > Upayavira - Nice idea pushing in a nominal update

Re: Multiple facet fields in "defaults" section of a Request Handler

2013-10-24 Thread Chris Hostetter
: Now a client wants to use multi select faceting. He calls the following API: : http://localhost:8983/solr/collection1/search?q=*:*&facet.field={!ex=foo}category&fq={!tag=foo}category : :"cat" : Putting the facet definitions in "appends" cases it to facet category 2 : times. : : Is there a way

Re: Solr subset searching in 100-million document index

2013-10-24 Thread Sandeep Gupta
Hi Joel, Thanks a lot for the information - I haven't worked with PostFilter's before but found an example at http://java.dzone.com/articles/custom-security-filtering-solr. Will try it over the next few days and come back if still have questions. Thanks again! Keep Walking, ~ Sandeep On Thu

Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina
Dear, humI don't know how can I use it..; I tried: my query: ti:snowboard (3095 results) I would like to have at the end of my XML, the Terms statistic for the field AP (applicant field (patent notice)) but I haven't that... Please help, Bruno /select?q=ti%Asnowboard&version=2.2&s

Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina
humm facet perfs are very bad (Solr 3.6.0) My index is around 87 000 000 docs. (4 * Proc double core, 24G Ram) I thought facets will work only on the result but it seems it's not the case. My request: http://localhost:2727/solr/select?q=ti:snowboard&rows=0&facet=true&facet.field=ap&facet.l

Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina
Just a little precision: solr down after running my URL :( so bad... Le 24/10/2013 22:04, Bruno Mannina a écrit : humm facet perfs are very bad (Solr 3.6.0) My index is around 87 000 000 docs. (4 * Proc double core, 24G Ram) I thought facets will work only on the result but it seems it's no

Join Query Behavior

2013-10-24 Thread Andy Pickler
We're attempting to upgrade from Solr 4.2 to 4.5 but are finding that 4.5 is not "honoring" this join query: ... & fq={!join from=project_id_i to=project_id_im}user_id_i:65615 -role_id_i:18 type:UserRole & On our Solr 4.2 instance adding/removing that query gives us different (and expected) resu

Post filter cache question

2013-10-24 Thread Eric Grobler
Hi If I run this query it is very fast (<10 ms) because it uses a "TopList" filter: q=*:* fl=adr_geopoint,adr_city,filterflags *fq=(filterflags:TopList) * and the number of relevant documents are 3000 out of 7 million. If I run the same query but add a spatial filter with cost: q=*:* fl=adr_geopo

Re: measure result set quality

2013-10-24 Thread Chris Hostetter
: As a first approach I will evaluate (manually :( ) hits that are out of the : intersection set for every query in each system. Anyway I will keep FYI: LucidWorks has a "Relevancy Workbench" tool that serves as a simple UI designed explicitly for the purpose of comparing the result sets of fro

Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Jonathan Rochkind
This is good to know, and I find it welcome advice; I would recommend making sure this advice is clearly highlighted in the relevant Solr docs, such as any getting started docs. I'm not sure everyone realizes this, and some go down tomcat route without realizing the Solr committers recommend j

Re: Post filter cache question

2013-10-24 Thread Chris Hostetter
: Could it be a problem with my cache settings in solrconfig.xml (solr 3.1) : or is my query wrong? 3.1? ouch ... PostFilter wasn't even added until 3.4... https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters ...so your spatial filter is definitely being applied to the entire in

Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Tim Vaillancourt
I agree with Jonathan (and Shawn on the Jetty explanation), I think the docs should make this a bit more clear - I notice many people choosing Tomcat and then learning these details after, possibly regretting it. I'd be glad to modify the docs but I want to be careful how it is worded. Is it fair

Re: Reclaiming disk space from (large, optimized) segments

2013-10-24 Thread Chris Hostetter
I didn't dig into the details of your mail too much, but a few things jumped out at me... : - At some time in the past, a manual force merge / optimize with : maxSegments=2 was run to troubleshoot high disk i/o and remove "too many Have you tried a simple commit using expungeDeletes=true? It s

Problem with glassfish and zookeeper 3.4.5

2013-10-24 Thread kaustubh147
Hi, Glassfish 3.1.2.2 Solr 4.5 Zookeeper 3.4.5 We have set up a SolrCloud with 4 Solr nodes and 3 zookeeper instances. It seems to be working fine from Solr admin page. but when I am trying to connect it to web application using Solrj 4.5. I am creating my Solr Cloud Server as suggested on the w

Re: Problem with glassfish and zookeeper 3.4.5

2013-10-24 Thread Shawn Heisey
On 10/24/2013 4:30 PM, kaustubh147 wrote: Glassfish 3.1.2.2 Solr 4.5 Zookeeper 3.4.5 We have set up a SolrCloud with 4 Solr nodes and 3 zookeeper instances. It seems to be working fine from Solr admin page. but when I am trying to connect it to web application using Solrj 4.5. I am creating my

Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Anshum Gupta
Thought you may want to have a look at this: https://issues.apache.org/jira/browse/SOLR-4792 P.S: There are no timelines for 5.0 for now, but it's the future nevertheless. On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt wrote: > I agree with Jonathan (and Shawn on the Jetty explanation), I

Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Tim Vaillancourt
Hmm, thats an interesting move. I'm on the fence on that one but it surely simplifies some things. Good info, thanks! Tim On 24 October 2013 16:46, Anshum Gupta wrote: > Thought you may want to have a look at this: > > https://issues.apache.org/jira/browse/SOLR-4792 > > P.S: There are no timel

Re: Post filter cache question

2013-10-24 Thread Eric Grobler
Hi Chris Thank you for your response. I will try to migrate to Solr 4.4 first! Best regards On Thu, Oct 24, 2013 at 10:44 PM, Chris Hostetter wrote: > > : Could it be a problem with my cache settings in solrconfig.xml (solr 3.1) > : or is my query wrong? > > 3.1? ouch ... PostFilter wasn't ev

First test cloud error question...

2013-10-24 Thread Jack Park
Background: all testing done on a Win7 platform. This is my first migration from a single Solr server to a simple cloud. Everything is configured exactly as specified in the wiki. I created a simple 3-node client, all localhost with different server URLs, and a lone external zookeeper. The online

Solr 4.5.1 and Illegal to have multiple roots (start tag in epilog?). (perhaps SOLR-4327 bug?)

2013-10-24 Thread Michael Tracey
Hey Solr-users, I've got a single solr 4.5.1 node with 96GB ram, a 65GB index (105 million records) and a lot of daily churn of newly indexed files (auto softcommit and commits). I'm trying to bring another matching node into the mix, and am getting these errors on the new node: org.apache.so

Solr indexing on email mime body and attachment

2013-10-24 Thread neerajp
Hi, I am integrating solr search engine with my email clients. I am sending POST request to Solr using REST. I am successfully able to post email's to, from, subject etc headers to solr for making index. Since email can have mime type bodies and attachments so I am not able to understand how to pos

Re: Reclaiming disk space from (large, optimized) segments

2013-10-24 Thread Otis Gospodnetic
Only skimmed your email, but purge every 4 hours jumped out at me. Would it make sense to have time-based indices that can be periodically dropped instead of being purged? Otis Solr & ElasticSearch Support http://sematext.com/ On Oct 23, 2013 10:33 AM, "Scott Lundgren" wrote: > *Background:* > >

Solr search in case the first keyword are not index?

2013-10-24 Thread dtphat
I have a problem with solr search: I have keyword, example: "apache solr reference", I index for "apache", "solr", "reference".when I search with the list keywords below:- apache solr reference -> OK- apache -> OK- solr -> OK- the same.But when the first keyword is not index, and other keywords are

Solr search in case the first keyword are not index

2013-10-24 Thread dtphat
I have a problem with solr search: I have keyword, example: "apache solr reference", I index for "apache", "solr", "reference". when I search with the list keywords below: - apache solr reference -> OK - apache -> OK - solr -> OK - the same. But when the first keyword is not index, and other keyw

Re: Multiple facet fields in "defaults" section of a Request Handler

2013-10-24 Thread Varun Thacker
I think you have explained it perfectly on how the tag exclusion makes it a different facet field and that no logic of default/invariants/appends would be able to solve this I went with the custom component approach. Although a very hacky solution could be defining this in defaults: {!ex=foo}cate