Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Erick Erickson
"fl": "id, author, content", >> > > > > "wt": "json", >> > > > > "hl.simple.pre": "", >> > > > > "_": "1450262674102" >>

Re: Append fields to a document

2015-12-16 Thread Erick Erickson
The only way to do this currently is with Atomic Updates, which require all fields to be stored except the destinations of copyField directives. see: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents Best, Erick On Wed, Dec 16, 2015 at 7:09 AM, Jamie Johnson wrote: >

Re: Solr cloud instance does not read cores from Zookeeper whilst connected

2015-12-16 Thread Erick Erickson
At a random guess, how are you starting Zookeeper and Solr? Is it possible that you're running the Zookeeper embedded in Solr but have an external Zookeeper running also? In that scenario you might be seeing one Zookeeper in the admin UI and another when trying to create the collection. Could you

Re: SolrCloud 4.8.1 - commit wait

2015-12-16 Thread Erick Erickson
rocessor - [catalogo_shard3_replica1] > webapp=/solr path=/update > params={update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from= > http://192.168.101.118:8080/solr/catalogo_shard2_replica3/&commit_end_point=true

Re: JVM error v ~StubRoutines::jbyte_disjoint_arraycopy

2015-12-16 Thread Erick Erickson
https://wiki.apache.org/lucene-java/JavaBugs See the last entry in the OpenJDK section, you're using one of the Java versions that has issues. So the first thing I'd try is up grading my JVM. Best, Erick On Wed, Dec 16, 2015 at 2:01 PM, abhayd wrote: > hi > > I have more than 50Gb in /tmp index

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Erick Erickson
I think you're still missing the critical bit. Highlighting is completely separate from searching. In other words, you can search on one field and highlight another. What field is searched is governed by the "qf" parameter when using edismax and by the the "df" parameter configured in your request

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Erick Erickson
alysis, it brings below information: > > ST > textraw_bytesstartendpositionLengthtypeposition > nietava[6e 69 65 74 61 76 61]0711 > SF > textraw_bytesstartendpositionLengthtypeposition > nietava[6e 69 65 74 61 76 61]0711 > LCF > textraw_bytesstartendpositionLengthty

Re: Strange debug output for a slow query

2015-12-16 Thread Erick Erickson
Hmmm, take a look at the individual queries on a shard, i.e. peek at the Solr logs and see if the fq clause comes through cleanly when you see &distrib=false. I suspect this is just a glitch in assembling the debug response. If it is, it probably deserves a JIRA. In fact it deserves a JIRA in eithe

Re: Problem with Solr indexing "non-searchable" pdf files

2015-12-17 Thread Erick Erickson
Not sure how much help I can be, I have no clue what DSpace is doing with Solr. If you're willing to try to index straight to Solr, you can always use SolrJ to parse the files, it's actually not very hard. Here's an example: https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ some databas

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-17 Thread Erick Erickson
;> > >> > I tried as well with the standard configuration, did it all over, >> reindexed >> > a couple times... and still did not work. >> > >> > Also, >> > >> > Using the Analysis, it brings below information: >> > >>

Re: SolrCloud 4.8.1 - commit wait

2015-12-17 Thread Erick Erickson
Glad to hear it's solved! The suggester stuff is way cool, but can surprise you! Erick On Thu, Dec 17, 2015 at 2:54 AM, Vincenzo D'Amore wrote: > Great!!! Great Erick! It was a buildOnCommit. > > Many thanks for your help. > > > > On Wed, Dec 16, 2015 at 6

Re: Strange debug output for a slow query

2015-12-17 Thread Erick Erickson
lear from those logs. And certainly you may simply have outgrown your hardware Best, Erick On Thu, Dec 17, 2015 at 8:49 AM, Shawn Heisey wrote: > On 12/16/2015 9:08 PM, Erick Erickson wrote: >> Hmmm, take a look at the individual queries on a shard, i.e. peek at >> the Solr

Re: Trying to index document in Solr with solr-spark library

2015-12-17 Thread Erick Erickson
Looks like your Spark job is not connecting to the same Zookeeper as your Solr nodes. Or, I suppose, the Solr nodes aren't started. You might get more information on the Cloudera help boards Best, Erick On Wed, Dec 16, 2015 at 11:58 PM, Guillermo Ortiz wrote: > I'm getting some errors when I t

Re: Expected mime type application/octet-stream but got text/html

2015-12-17 Thread Erick Erickson
Andrej: Indeed, it's a doc problem. A long time ago in a Solr far away, there was a bunch of effort to use the "default" collection (collection1). When that was changed, this documentation didn't get updated. We'll update it in a few, thanks for reporting! Erick On Thu, Dec 17, 2015 at 1:39 AM,

Re: Load-balancing Solr instances

2015-12-18 Thread Erick Erickson
You're over-complicating it, the complexity is already in Solr ;)... First, if your using a SolrJ client (assuming you're accessing Solr from your app), use the CloudSolrClient class. This takes a ZK ensemble and does it's own load balancing via a software load balancer. If you're not using SolrJ

Re: Issues when indexing PDF files

2015-12-18 Thread Erick Erickson
This could also simply be your browser isn't set up to display UTF-8, the characters may be just fine. Best, Erick On Fri, Dec 18, 2015 at 12:58 AM, Zheng Lin Edwin Yeo wrote: > Thanks for all your replies. > > I did chance upon this question from stackoverflow which it says is able to > solve t

Re: SolR 5.3.1 deletes index files

2015-12-18 Thread Erick Erickson
Andreas: Let me see if I understand correctly: You have two Solr instances pointing at the _same_ NFS-mounted directory. The lock type of "single" implies this. And you're totally and absolutely sure that only _one_ Solr instance writes to that directory _ever_, right? It's not even the case that

Re: Admin Optimize

2015-12-18 Thread Erick Erickson
Right, the whole optimize thing is in a bit of a state of flux. For indexes that change quite regularly, it's something of a trap as making one big segment gets in the way of the merging algorithm. It'll still work, but it's not all that useful. For static indexes there's anecdotal evidence that i

Re: Permutations of entries in a multivalued field

2015-12-18 Thread Erick Erickson
The other thing to check is the ComplexPhraseQueryParser, see: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser It uses the Span queries to build up the query... Best, Erick On Fri, Dec 18, 2015 at 11:23 AM, Allison, Timothy B. wrote: > Hi Joh

Re: Please add me to the ContributorsGroup

2015-12-18 Thread Erick Erickson
Done, thanks for contributing! On Fri, Dec 18, 2015 at 11:58 AM, Alan Thompson wrote: > I would like to contribute to the Solr wiki. I am user "AlanThompson > " > > Thanks, > > Alan > > > - > > Con

Re: Admin Optimize

2015-12-18 Thread Erick Erickson
so if it is missing, it could > be a bug. Is it missing on the old, or new, or both UIs? > > Thanks! > > Upayavira > > On Fri, Dec 18, 2015, at 07:08 PM, Erick Erickson wrote: >> Right, the whole optimize thing is in a bit of a state of flux. For >> indexes that chang

Re: Auto-optimization for Solr indexes

2015-12-20 Thread Erick Erickson
Much depends on how often the index is updated. If your index only changes, say, once a day then it's probably a good idea. If you're constantly updating your index, then I'd recommend that you do _not_ optimize. Optimizing will create one large segment. That segment will be unlikely to be merged

Re: Auto-optimization for Solr indexes

2015-12-20 Thread Erick Erickson
s that have to be taken into > consideration too? > > Regards, > Edwin > > > On 21 December 2015 at 12:12, Erick Erickson > wrote: > >> Much depends on how often the index is updated. If your index only >> changes, say, once a day then it's probably a good

Re: Auto-optimization for Solr indexes

2015-12-20 Thread Erick Erickson
n it is not > optimized. We dont have any quantitative measurement for the same. Its just > an observation. Is this correct observation? > > If we optimize it every day, the indexes will not be skewed right? > > Please let me know if my understanding is correct. > > Regard

Re: Numerous problems with SolrCloud

2015-12-21 Thread Erick Erickson
ZK isn't pushed all that heavily, although all things are possible. Still, for maintenance putting Zk on separate machines is a good idea. They don't have to be very beefy machines. Look in your logs for LeaderInitiatedRecovery messages. If you find them then _probably_ you have some issues with t

Re: TPS with Solr Cloud

2015-12-21 Thread Erick Erickson
8,000 TPS almost certainly means you're firing the same (or same few) requests over and over and hitting the queryResultCache, look in the adminUI>>core>>plugins/stats>>cache>>queryResultCache. I bet you're seeing a hit ratio near 100%. This is what Toke means when he says your tests are too lightw

Re: solrcloud used a lot of memory and memory keep increasing during long time run

2015-12-21 Thread Erick Erickson
>> >> 2,815 instances of "org.apache.lucene.index.StandardDirectoryReader", loaded >> by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy >> 970,614,912 (16.63%) bytes. These instances are referenced from one instance >> of "java.la

Re: Numerous problems with SolrCloud

2015-12-21 Thread Erick Erickson
> Turning to the Solr logs, a quick sweep revealed a lot of "Caused by: > java.net.SocketException: Connection reset" lines, but this isn't very > explicit. I suppose I'll have to cross-check on the concerned server(s). > > Anyway, I'll have a try at the updated setting and I&#x

Re: Re: Some problems when upload data to index in cloud environment

2015-12-21 Thread Erick Erickson
Jianer: Getting your head around the configs is, indeed, "exciting" at times. I just wanted to caution you that using ExtractingRequestHandler puts the Tika parsing load on the Solr server, which doesn't scale as the same machine that's serving queries and indexing is _also_ parsing potentially v

Re: solrcloud used a lot of memory and memory keep increasing during long time run

2015-12-21 Thread Erick Erickson
support NRT search in other > ways. Can > you give me some advices? > > The value of maxWarmingSearchers is copied from some example configs I > think, > I’ll try to set it back to 2. > > What can we benefit from set maxWarmingSearchers to a larger value? I > don't

Re: Solr 5.4 leader selection

2015-12-24 Thread Erick Erickson
I wouldn't worry about it. I doubt you could even measure the change in overall cluster performance even if all three leaders were on the same node. The REBALANCELEADERS stuff was put in for cases of having 100s of leaders on the same machine. It won't hurt, but is almost certainly completely unn

Re: Unable to extract images content (OCR) from PDF files using Solr

2015-12-24 Thread Erick Erickson
Here's an example of what Upayavira is talking about. https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ It has some RDBMS bits, but you can take those out. Best, Erick On Wed, Dec 23, 2015 at 1:27 AM, Upayavira wrote: > If your needs of Tika fall outside of those provided by the embed

Re: [Highlight] Storing one field, highlight with different analysers

2015-12-24 Thread Erick Erickson
Well, actually points 2 and 3 _do_ depend on the stored data. It's certainly true that the stored data won't have to be re-analyzed if you're using FVH, but the original text still needs to be present to highlight anything that would make sense (consider stopwords, stemming all that. The user reall

Re: Solr - facet fields that contain other facet fields

2015-12-28 Thread Erick Erickson
bq: so I cannot copy this field to a text field with a keywordtokenizer or strfield 1> There is no restriction on whether a field is analyzed or not as far as faceting is concerned. You can freely facet on an analyzed field or String field or KeywordTokenized field. As Binoy says, though, facetin

Re: SolrMeter is still a feasible tool for measuring performances?

2015-12-28 Thread Erick Erickson
SolrMeter has some pretty cool features, one of which is to extract queries from existing Solr logs. If the Solr logging patterns have changed, which they do, that may require some fixing up... Let us know... Erick On Mon, Dec 28, 2015 at 12:25 AM, Binoy Dalal wrote: > Hi Gian > We've using sol

Re: Facet shows deleted values...

2015-12-29 Thread Erick Erickson
Let's be sure we're using terms similarly That article is from 2010, so is unreliable in the 5.2 world, I'd ignore that. First, facets should always reflect the latest commit, regardless of expungeDeletes or optimizes/forcemerges. _commits_ are definitely recommended. Optimize/forcemerge (or

Re: Solr - facet fields that contain other facet fields

2015-12-29 Thread Erick Erickson
terms into a special field, although I > would like the terms to be highlighted (or have some type of position so I > can highlight it). > > Regards, > > Kevin > > On Mon, Dec 28, 2015 at 12:49 PM, Erick Erickson > wrote: > >> bq: so I cannot copy this field t

Re: Solr index segment level merge

2015-12-29 Thread Erick Erickson
Could you simply add the new documents to the current index? That aside, merging does not need to create a new core or a new folder. The form: mergeindexes&core=core0&indexDir=/opt/solr/core1/data/index&indexDir=/opt/solr/core2/data/index Should merge the indexes from the two directories into th

Re: multi term analyzer error

2015-12-30 Thread Erick Erickson
Right, you may be one of the few people to actually implement your own multiTerm analyzer function despite the fact that this has been in the code for years! If you look at the factories and see if they implement the "MultiTermAwareComponent" interface, and PatternReplaceCharFitlerFactory does _no

Re: Testing Solr configuration, schema, and other fields

2015-12-30 Thread Erick Erickson
Yeah, the notion of DTDs have gone around several times but always founder on the fact that you can, say, define your own Filter with it's own set of parameters etc. Sure, you can make a generic DTD that accommodates this, but then it becomes so general as to be little more than a syntax checker.

Re: Add me to the Solr ContributorsGroup

2015-12-30 Thread Erick Erickson
Done On Wed, Dec 30, 2015 at 5:36 PM, Saïd Radhouani wrote: > Hi - I'd appreciate if you could add me to the Contributor Group. Here are > my account info : > > - Name: Saïd Radhouani > - User name: radhouani > - email: said.radhou...@gmail.com > > Thanks, > -Saïd

Re: Teiid with Solr - using any other engine except the SolrDefaultQueryEngine

2015-12-31 Thread Erick Erickson
In addition, and depending on your time-frame, you may want to work with Solr 6.0 and the "ParallelSQL" option. NOTE: this is _very_ new. People are using it but it'll probably have some rough edges for a while, not to mention you're using an unreleased version of Solr. BTW, Solr 6.0 is also curre

Re: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Erick Erickson
Hmmm, a couple of things: the bin/solr script could be used as a model in this scenario for how to automate a lot of this. I'm thinking you can skip all the argument parsing and that and just see how the SolrCLI jar file is used to spin up collections, upload configs and the like. In fact, assumin

Re: SOLR 5.4.0?

2015-12-31 Thread Erick Erickson
Ere: Can you help with testing the patch if it's important to you? Ramkumar is working on it... Best, Erick On Wed, Dec 30, 2015 at 11:07 PM, Ere Maijala wrote: > Well, for us SOLR-8418 is a major issue. I haven't encountered other issues, > but that one was sort of a show-stopper. > > --Ere >

Re: Memory Usage increases by a lot during and after optimization .

2016-01-02 Thread Erick Erickson
If you happen to be looking at "top" or the like, you might be seeing virtual memory, see Uwe's excellent blog here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Best, Erick On Fri, Jan 1, 2016 at 11:46 PM, Shawn Heisey wrote: > On 12/31/2015 8:03 PM, Zheng Lin Edwin Y

Re: Multiple solr instances on one server

2016-01-04 Thread Erick Erickson
Right, that's the most common reason to run multiple JVMs. You must be running multiple replicas on each box though to make that viable. By running say 2 JVMS, you're essentially going from hosting, say, 4 replicas in one JVM to 2 replicas in each of 2 JVMs. You'll incur some overhead due to the s

Re: [Manual Sharding] Solr distrib search cause thread exhaustion

2016-01-04 Thread Erick Erickson
How many threads are you allocating for the servlet container? 10,000 is the "usual" number. Best, Erick On Mon, Jan 4, 2016 at 5:21 AM, Alessandro Benedetti wrote: > Hi guys, > this is the scenario we are studying : > > Solr 4.10.2 > 16 shards, a solr instance aggregating the results running a

Re: how to search miilions of record in solr query

2016-01-04 Thread Erick Erickson
Best of luck with that ;). 250ms isn't bad at all for "searching millions of IDs". Frankly, I'm not at all sure where I'd even start. With millions of search terms, I'd have to profile the application to see where it was spending the time before even starting. Best, Erick On Mon, Jan 4, 2016 at 5

Re: MapReduceIndexerTool Indexing

2016-01-04 Thread Erick Erickson
Yes it does. MRIT is intended for initial bulk loads. It takes whatever it's pointed at and indexes it. Additionally, it does not update documents. If the same document (by ID) is indexed twice, you'll wind up with two copies in your results. Best, Erick On Mon, Jan 4, 2016 at 5:00 AM, vidya wr

Re: shard lost - solr5.3

2016-01-04 Thread Erick Erickson
There's no reason to shut down your node. You should be able to issue a REBALANCELEADERS command, see: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders on a currently-running cluster and all your preferred leaders (assuming the nodes are up) should b

Re: Hard commits, soft commits and transaction logs

2016-01-04 Thread Erick Erickson
As far as I know. If you see anything different, let me know and we'll see if we can update it. Best, Erick On Mon, Jan 4, 2016 at 1:34 AM, Clemens Wyss DEV wrote: > [Happy New Year to all] > > Is all herein > https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-c

Re: Facet shows deleted values...

2016-01-04 Thread Erick Erickson
bq: And I also read somewhere that explicit commit is not recommended in SolrCloud mode Not quite, it's just easy to have too many commits happen too frequently from multiple indexing clients. It's also rare that the benefits of the clients issuing commits outweighs the chance of getting it wrong

Re: Solr suggest, auto complete & spellcheck

2016-01-04 Thread Erick Erickson
Here's a writeup on suggester: https://lucidworks.com/blog/2015/03/04/solr-suggester/ The biggest difference is that spellcheck returns individual _terms_ whereas suggesters can return entire fields. Neither are "a function of the UI" any more than searching is a function of the UI. In both cases

Re: how to search miilions of record in solr query

2016-01-05 Thread Erick Erickson
So still use Ere's suggestion. There's no reason at all to search all million every time. If start=0, just search the first N (say 1,000). Keep doing that until you don't get docs then add more docs. Or fire off the first query then, when you know there are going to be pagination, fire off the ful

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Erick Erickson
Matteo: Let's see if I understand your problem. Essentially you want Solr to analyze the filter queries and decide through some algorithm which ones to cache. I have a hard time thinking of any general way to do this, certainly there's not hing in Solr that does this automatically As Binoy mention

Re: SOLR replicas performance

2016-01-05 Thread Erick Erickson
What version of Solr? Prior to 5.2 the replicas were doing lots of unnecessary work/being blocked, see: https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/ Best, Erick On Tue, Jan 5, 2016 at 6:09 AM, Matteo Grolla wrote: > Hi Luca, > not sure if I understoo

Re: How to use DocValues with TextField

2016-01-05 Thread Erick Erickson
Assuming (and it wasn't clear from your problem statement) that you need to search tokens in your field, this approach should be fine. I think Markus' comment was assuming that you did _not_ need to search the field. If you do, a copyField seems best. Do be aware, though, that this will make for a

Re: Data migration from one collection to the other collection

2016-01-05 Thread Erick Erickson
What changes? You simply have "hot" and "cold" collections. When it comes time to index data you: 1> create a collection 2> index to it. 3> use the Collections API to point your "active" collection to this new one 4> do whatever you want with the old one. The setup is, of course, that your hot and

Re: solr 5.2.0 need to build high query response

2016-01-05 Thread Erick Erickson
It sounds like you're not doing proper autowarming, which you'd need to do either with hard or soft commits that open new searchers. see: https://wiki.apache.org/solr/SolrCaching#Cache_Warming_and_Autowarming In particular, you should have a newSearcher event that facets on the fields you expect

Re: MapReduceIndexerTool Indexing

2016-01-05 Thread Erick Erickson
MRIT is not designed for that scenario, so you simply can't. What people usually do is have a process whereby, after the initial bulk load, there is some way their system-of-record "knows" what new docs have been added since and indexes only those. Flume is sometimes used if you have access. Best

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Erick Erickson
cache > > I think it's because the last two filters are not so selective, they are > resolved to two bitset which are then anded together > and this is less efficient than leapfrogging since the first filter has > just one or two results. > Does it make sense to you? > >

Re: solr 5.2.0 need to build high query response

2016-01-05 Thread Erick Erickson
> > > *or may be here too.* > > static firstSearcher warming in > solrconfig.xml > > > > > Thanks, > Novin > > On Tue, 5 Jan 2016 at 16:22 Erick Erickson wrote: > >> It sounds like you'r

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Erick Erickson
ics I thought this kind > of problem were general enough that I could find a plugin already built > > 2016-01-05 19:17 GMT+01:00 Erick Erickson : > >> >> &fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:,fq={!cache=false}type: >> >> You

Re: how to search miilions of record in solr query

2016-01-05 Thread Erick Erickson
so why do you need to ask for all of them when the user is (presumably) paging? My point is that you'll have to get creative to meet your requirements, Solr is unlikely to meet them. Best, Erick On Tue, Jan 5, 2016 at 8:25 PM, Mugeesh Husain wrote: > @Erick Erickson thanks for reply, &g

Re: Solr server not starting

2016-01-06 Thread Erick Erickson
I doubt we'll be much help, it's probably best to talk to the echoprint people, assuming any are still available. I took a quick look at the project and the Solr implementation is from 4+ years ago... Best, Erick On Wed, Jan 6, 2016 at 8:11 AM, agonn Qurdina wrote: > Hi, > > > > I am using Solr

Re: core,Collection,Shard,Replication

2016-01-06 Thread Erick Erickson
bq: But when indexing a document in one shard,it gets reflected in every shard of that collection This is a misunderstanding (and I'm being a bit pedantic here). Each shard contains a portion of the entire corpus. Say you have 1M docs and 2 shards. Each shard will have very close to 500K documents

Re: Cleanup solr cloud after failure in collection creation

2016-01-06 Thread Erick Erickson
The mail server is quite aggressive about removing attachments, none of yours came through. Perhaps put them somewhere else and provide a link? Best, Erick On Wed, Jan 6, 2016 at 3:22 AM, Gian Maria Ricci - aka Alkampfer < alkamp...@nablasoft.com> wrote: > I’ve issued a command to create some co

Re: solr 5.2.0 need to build high query response

2016-01-06 Thread Erick Erickson
peed goes to 1 sec to 1.2 sec. I actually needed around 500 ms. > > On Tue, 5 Jan 2016 at 18:24 Erick Erickson wrote: > >> Yep. Do note what's happening here. You're executing a query >> that potentially takes 10 seconds to execute (based on your >> earlier pos

Re: I cannot create replica in Solr

2016-01-06 Thread Erick Erickson
It looks like you haven't uploaded the configset to Zookeeper so it can be found by the create command. See: https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files. Best, Erick On Wed, Jan 6, 2016 at 1:33 PM, persoy wrote: > Hi > I'm using Solr clouds. I c

Re: When there will be clusterstate.json after solr v5.3?

2016-01-07 Thread Erick Erickson
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-MigrateClusterState On Wed, Jan 6, 2016 at 10:43 PM, forest_soup wrote: > We using solr 5.3.1 with solrcloud mode. > > As the document said, the state of newly created collections is stored in > the state.json of ea

Re: I cannot create replica in Solr

2016-01-07 Thread Erick Erickson
Those files have nothing to do with an uploaded config! You _should_ be seeing files like "solrconfig.xml", "schema.xml" (or "managed_schema"), perhaps files like "stopwords.txt" and the like. Essentially, everything that you would expect to find in the "conf" directory for Solr. Somewhere, you mu

Re: Manage schema.xml via Solrj?

2016-01-07 Thread Erick Erickson
I'd ask first what the high-level problem you're trying to solve is, this could be an XY problem. That said, there's the Schema API you can use, see: https://cwiki.apache.org/confluence/display/solr/Schema+API You can access it from the SolrJ library, see SchemaRequest.java. For examples of using

Re: Manage schema.xml via Solrj?

2016-01-08 Thread Erick Erickson
First, Daniel nailed the XY problem, but this isn't that... You're correct that hand-editing the schema file is error-prone. The managed schema API is your friend here. There are several commercial front-ends that already do this. The managed schema API is all just HTTP, so there's nothing preclu

Re: Solr search and index rate optimization

2016-01-08 Thread Erick Erickson
Here's a longer form of Toke's answer: https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ BTW, on the surface, having 5 ZK nodes isn't doing you any real good. Zookeeper isn't really involved in serving queries or handling updates, it's pur

Re: date difference faceting

2016-01-08 Thread Erick Erickson
I'm going to side-step your primary question and say that it's nearly always best to do your calculations up-front during indexing to make queries more efficient and thus serve more requests on the same hardware. This assumes that the stat you're interested in is predictable of course... Best, Eri

Re: solrcloud -How to delete a doc at a specific shard

2016-01-08 Thread Erick Erickson
This simply shouldn't be the case if by "duplicate" you mean it has the same id (i.e. the field defined as the uniqueKey in schema.xml). If you do have docs in different shards with the same ID, then something is very strange about your setup. What version of Solr BTW? Assuming you mean "same con

Re: Specifying a different txn log directory

2016-01-09 Thread Erick Erickson
Please show us exactly what you did. and exactly what you saw to say that "does not seem to work". Best, Erick On Fri, Jan 8, 2016 at 7:47 PM, KNitin wrote: > Hi, > > How do I specify a different directory for transaction logs? I tried using > the updatelog entry in solrconfig.xml and reloaded t

Re: solrcloud -How to delete a doc at a specific shard

2016-01-09 Thread Erick Erickson
I don't really know unless there's _something_ different about the docs, and you could delete by _query_, something like id=XXX AND (condition unique to the doc you want to remove). I'm more concerned about how there got to be duplicate entries in the first place. There really shouldn't be any wit

Re: SolrCloud: Setting/finding node names for deleting replicas

2016-01-09 Thread Erick Erickson
For some reason, "slice" is the preferred term in the _code_, while "shard" is preferred in docs FWIW Erick On Fri, Jan 8, 2016 at 3:51 PM, Jeff Wartes wrote: > > Honestly, I have no idea which is "old". The solr source itself uses slice > pretty consistently, so I stuck with that when I st

Re: Querying only replica's

2016-01-09 Thread Erick Erickson
bq: is it best/good to get the CLUSTERSTATUS via the collection API and explicitly send queries to a replica to ensure I don't send queries to the leaders of my collection In a word _no_. SolrCloud is vastly different than the old master/slave. In SolrCloud, each and every node (leader and replica

Re: Configuring cores to persist in the event of Solr restart

2016-01-10 Thread Erick Erickson
The key here is whether you are connecting to the same Zookeeper, an internal or an external one. So if you use the -c option without providing a -z option, you use the embedded Zookeeper. If you later start with a -z option, that's a _different_ zookeeper. And, btw, Zookeeper defaults to keepin

Re: Querying only replica's

2016-01-10 Thread Erick Erickson
t; > > On 09/01/16 23:44, Erick Erickson wrote: >> >> bq: is it best/good to get the CLUSTERSTATUS via the collection API >> and explicitly send queries to a replica to ensure I don't send >> queries to the leaders of my collection >> >> In a wo

Re: Solr search and index rate optimization

2016-01-10 Thread Erick Erickson
y machines compared to your Solr data nodes since they are just > doing light-weight orchestration. But yea, for a 2 data node system one > might be willing to go with a 3 node ensemble to tolerate a single ZK > node dying, just depends on how much cash you are willing to spend and > availabi

Re: Selective Replication from master to slave

2016-01-10 Thread Erick Erickson
This is really confusing. You say: bq: Basically I am going with master slave approach OK, classic Solr master/slave? Or are you using this in a different context? bq: the application pushing data to master will need to preview the search and if the search is deemed useful/appropriate I need the

Re: Change leader in SolrCloud

2016-01-11 Thread Erick Erickson
Shawn is spot-on, here's a little bit of "color commentary" bq: all new documents to index will be routed to the same machine, thus indexing load is not subdivided This is something of a misconception. Indexing is always done on all nodes, leaders and replicas alike in SolrCloud. The leader is r

Re: Change leader in SolrCloud

2016-01-11 Thread Erick Erickson
You have to assign the preferredLeader role first. You can do that node-by-node via ADDREPLICAPROP or have the system do it for you with BALANCESHARDUNIQUE. As I said before, in SolrCloud the leader forwards the raw document to each follower. There is no pre-processing, analysis anything else done

Re: collection configuration stored in Zoo Keeper with solrCloud

2016-01-11 Thread Erick Erickson
Do be a little careful though. The sample zookeeper config that comes with an Apache install of Zookeeper defaults to storing the data in /tmp/zookeeper which is _not_ a place you want persistent data on *nix systems. Note, this is _not_ the default for embedded Zookeeper in Solr. And the othe

Re: Problems using MapReduceIndexerTool with multiple reducers

2016-01-11 Thread Erick Erickson
Hmm, it looks like you created your collection with the "implicit" router. Does the same thing happen when you use the default compositeId router? Note, this should be OK with either, this is just to gather more info. Other questions: 1> Are you running MRIT over Solr indexes that are actually ho

Re: indexing rich data with solr 5.3

2016-01-11 Thread Erick Erickson
Looks like a bad file. Do you have any success using DIH on any files? What happens if you just send that particular file throug the ExtractingRequestHandler? Best, Erick On Mon, Jan 11, 2016 at 3:51 PM, kostali hassan wrote: > such files msword and pdf donsnt indexing using *dataimoprt i have

Re: Change leader in SolrCloud

2016-01-11 Thread Erick Erickson
bq: It seems to me a huge wasting of resources. How else would you guarantee consistency? Especially taking in to account Lucene's write-once segments? Master/Slave sidesteps the problem by moving entire, closed segments to the slave, but as Shawn says if the master goes down the slaves don't hav

Re: multiple solr-config.xml files per core

2016-01-11 Thread Erick Erickson
bq: Can Solr Server have different/multiple solr-config.xml file per core? Yes. Each separate core can (and usually does) have its own configs, solrconfig.xml, schema and the like. Your question could be interpreted as asking if you can have multiple solrconfig.xml files in the _same_ core, the

Re: WArning in SolrCloud logs

2016-01-11 Thread Erick Erickson
Just show us the solrconfig.xml file, particularly anything referring to replication, it's easier than talking past each other. Best, Erick. On Mon, Jan 11, 2016 at 12:18 PM, Gian Maria Ricci - aka Alkampfer wrote: > Actually that is a collection I've created uploading into Zookeeper a > confi

Re: Solr 5.3.1 ArrayIndexOutOfBoundsException while running a query

2016-01-11 Thread Erick Erickson
The Solr logs should have a much more complete stack trace if you can locate them. 1G of memory is very little for any serious Solr. I'm assuming you restarted Solr after the OOM, but Java isn't entirely reliable after an OOM. FWIW, Erick On Mon, Jan 11, 2016 at 6:34 PM, Kelly, Frank wrote: >

Re: solrcloud -How to delete a doc at a specific shard

2016-01-11 Thread Erick Erickson
OK, what exactly do you mean you "changed zookeeper"? If you went in and reassigned IP addresses to nodes then all bets are off. So do you have just a single (or a few) docs that are dups or lots? And by "lots", I'm thinking if all the duplicate IDs are documents that have been indexed since you "

Re: realtime get requirements

2016-01-12 Thread Erick Erickson
right, suggester had some bad behavior where it rebuilt on startup despite setting the flag to _not_ do that. See: Some details here: https://lucidworks.com/blog/2015/03/04/solr-suggester/ Best, Erick On Tue, Jan 12, 2016 at 8:12 AM, Matteo Grolla wrote: > ok, > suggester was responsible

Re: SolrCloud, DIH, and XPathEntityProcessor

2016-01-12 Thread Erick Erickson
Yeah, that's essentially the nature of open source, someone gets frustrated enough with current behavior and fixes it ;)... There's never any harm in opening a JIRA, all you need to do is register. It's not a bad idea to open on as you _start_ writing the code, even providing very early versions o

Re: It's possible up and debug solr in eclipse IDE?

2016-01-12 Thread Erick Erickson
And a neater way to debug stuff rather than attaching to Solr is to step through the Junit tests that exercise the code you need to work on rather than attach to a remote Solr. This is often much faster rather than compile/start solr/attach. Of course some problems don't fit that process, but I th

Re: solrcloud -How to delete a doc at a specific shard

2016-01-12 Thread Erick Erickson
bq: it is too hard understand,what do you mean "lots"? I mean that if you have one or two duplicate docs it's worth looking at things like leading or trailing spaces in the ID leading to IDs that look identical but aren't. If it's hundreds or thousands of docs, then it's probably indicative of so

Re: indexing rich data with solr 5.3

2016-01-12 Thread Erick Erickson
penxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:203) > at > org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:673) > at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:73) > &

Re: Boost does not appear in solr debug explain debug

2016-01-12 Thread Erick Erickson
You won't necessarily find both if those values are NOT in the particular document. If you have a document you know contains both but doesn't appear in your results list, consider using explainOther to see how the doc of interest is actually scored. Best, Erick On Tue, Jan 12, 2016 at 1:54 AM, Vi

  1   2   3   4   5   6   7   8   9   10   >