Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Mikhail Khludnev
Hello, Karl. Please check these: https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#constraints-when-using-cursors https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#solrentityprocessor cursorMark="true" Good luck. On

Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Karl Stoney
I cannot believe how much of a difference that cursorMark and sort order made. Previously it died about 800k docs, now we're at 1.2m without any slowdown. Thank you so much On 06/02/2020, 08:14, "Mikhail Khludnev" wrote: Hello, Karl. Please check these: https://eur03.safelinks.pro

Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Karl Stoney
Spoke too soon, looks like it memory leaks. After about 1.3m the old gc times went through the root and solr was almost unresponsive, had to abort. We're going to write our own implementation to copy data from one core to another that runs outside of solr. On 06/02/2020, 09:57, "Karl Stoney"

Re: Bug? Documents not visible after sucessful commit - chaos testing

2020-02-06 Thread Michael Frank
Hi Chris, thank you for your detailed answer! We are aware that Solr Cloud is eventually consistent and in our application that's fine in most cases. However, what is really important for us is that we get a "Read Your Writes" for a clear point in time - which in our understand should be after har

Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Mikhail Khludnev
Egor, would you mind to share some best practices regarding cursorMark in SolrEntityProcessor? On Thu, Feb 6, 2020 at 1:04 PM Karl Stoney wrote: > Spoke too soon, looks like it memory leaks. After about 1.3m the old gc > times went through the root and solr was almost unresponsive, had to > abo

migrating my application

2020-02-06 Thread Carmen Márquez Vázquez
Hello, I am migrating my application that uses Solr 4.4.0 to use Solr 8.2.0. I have the following code that I am unable to migrate. Can you help me? new ChainedFilter(filters.toArray(new Filter[filters.size()]), ChainedFilter.OR); Thanks in advance.

Storage/Volume type for Kubernetes Solr POD?

2020-02-06 Thread Susheel Kumar
Hello, Whats type of storage/volume is recommended to run Solr on Kubernetes POD? I know in the past Solr has issues with NFS storing its indexes and was not recommended. https://kubernetes.io/docs/concepts/storage/volumes/ Thanks, Susheel

Re: Checking in on Solr Progress

2020-02-06 Thread Erick Erickson
When you say “look”, where are you looking from? Http requests? SolrJ? The admin UI? Zookeeper is always the keeper of the state, so when the replica is “active” _AND_ the replica’s node is in the “live_nodes” hive it’s up. The Collections API CLUSTERSTATUS can help here if you’re not using Sol

Re: NoClassDefFoundError - Faceting on 8.2.0

2020-02-06 Thread Erick Erickson
My first guess is that you have multiple or out-of-date jars in your classpath on those machines. Best, Erick > On Feb 5, 2020, at 5:53 PM, Joe Obernberger > wrote: > > Hi All - getting this error intermittently on a solr cloud cluster. > Sometimes the heatmap generation works, sometimes no

Re: StatelessScriptUpdateProcessorFactory causing OOM errors?

2020-02-06 Thread Erick Erickson
How many fields do you wind up having? It looks on a quick glance like it depends on the values of fields. While I’ve seen Solr/Lucene handle indexes with over 1M different fields, it’s unsatisfactory. What I’m wondering is if you are adding a zillion different fields to your docs as time passes a

JSON from Term Vectors Component

2020-02-06 Thread Doug Turnbull
Hi all, I was curious if anyone had any tips on parsing the JSON response of the term vectors component? Or anyway to force it to be more standard JSON? It appears to be very heavily nested and idiosyncratic JSON, such as below. Notice the lists, within lists, within lists. Where the keys are adj

Re: JSON from Term Vectors Component

2020-02-06 Thread Munendra S N
> > Notice the lists, within lists, within lists. Where the keys are adjacent > items in the list. Is there a reason this isn't a JSON dictionary? > I think this is because of NamedList. Have you tried using json.nl=map as a query parameter for this case? Regards, Munendra S N On Thu, Feb 6, 20

Re: JSON from Term Vectors Component

2020-02-06 Thread Doug Turnbull
Thanks for the tip, The issue is json.nl produces non-standard json with duplicate keys. Solr generates the following, which json lint fails given multiple keys { "positions": { "position": 155, "position": 844, "position": 1726 } } On Thu, Feb 6, 2020 at 11:36 AM Munendra S N wrote: > > > > N

Re: JSON from Term Vectors Component

2020-02-06 Thread Walter Underwood
Repeated keys are quite legal in JSON, but many libraries don’t support that. It does look like that data layout could be redesigned to be more portable. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 6, 2020, at 8:38 AM, Doug Turnbull > wrote

Re: JSON from Term Vectors Component

2020-02-06 Thread Doug Turnbull
Well that is interesting, I did not know that! Thanks Walter... https://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object I gave it a go in Python (what I'm using) to see what would happen, indeed it gives some odd behavior In [4]: jsonStr = ' {"test": 1, "t

Re: JSON from Term Vectors Component

2020-02-06 Thread Walter Underwood
It is one of those things that happens when you don’t have a working group beat on a spec for six months. With an IETF process, I bet JSON would disallow duplicate keys and have comments. It might even have a datetime data type or at least recommend ISO8601 in a string. I was on the Atom workin

Re: Checking in on Solr Progress

2020-02-06 Thread dj-manning
Erick Erickson wrote > When you say “look”, where are you looking from? Http requests? SolrJ? The > admin UI? I'm open to looking form anywhere - http request, or the admin UI, or following a log if possible. My objective for this ask would be to human interactively follow/watch solr's recovery

Re: JSON from Term Vectors Component

2020-02-06 Thread Edward Ribeiro
Python's json lib will convert text as '{"id": 1, "id": 2}' to a dict, that doesn't allow duplicate keys. The solution in this case is to inject your own parsing logic as explained here: https://stackoverflow.com/questions/29321677/python-json-parser-allow-duplicate-keys One possible solution (bel

Re: Checking in on Solr Progress

2020-02-06 Thread Erick Erickson
There’s actually a crying need for this, but there’s nothing that’s there yet, basically you have to look at the log files and try to figure it out. Actually I think this would be a great thing to work on, but it’d be pretty much all new. If you’d like, you can create a Solr Improvement Proposa

Re: JSON from Term Vectors Component

2020-02-06 Thread Doug Turnbull
FWIW, I ended up writing some code that does a best effort turning the named list into a dict representation, if it can't, it'll keep it as a python tuple. def every_other_zipped(lst): return zip(lst[0::2],lst[1::2]) def dictify(nl_tups): """ Return dict if all keys unique, otherwise

Re: DataImportHandler SolrEntityProcessor configuration for local copy

2020-02-06 Thread Mikhail Khludnev
Karl, what would you do if that own implementation stalls in GC, or smashes Solr over? On Thu, Feb 6, 2020 at 1:04 PM Karl Stoney wrote: > Spoke too soon, looks like it memory leaks. After about 1.3m the old gc > times went through the root and solr was almost unresponsive, had to > abort. We'

Re: Solr 7.7 heap space is getting full

2020-02-06 Thread Rajdeep Sahoo
If we reduce the no of threads then is it going to help. Is there any other way to debug this. On Mon, 3 Feb, 2020, 2:52 AM Walter Underwood, wrote: > The only time I’ve ever had an OOM is when Solr gets a huge load > spike and fires up 2000 threads. Then it runs out of space for stacks. > >

Re: How can shards distributed evenly among nodes

2020-02-06 Thread Radar Lei
This is weird, when we creating an index, Solr will make sure shards of an index be distributed to all the existing nodes evenly. But after you used 'UTILIZENODE' of AutoScale, Solr will try to put all the shards of an index to one or several nodes. Is this intentional or a bug? For example, we ha