Re: Data ingestion with predefined buckets

2020-04-22 Thread Anthony Baker
Steve, Have you looked at grouping your putAll() requests into groups that align to Geode’s buckets? In your application code, you can determine the hash for each data item and self-partition the entries. This allows you to send the requests on separate threads in parallel while optimizing ne

Re: Data ingestion with predefined buckets

2020-04-16 Thread Anilkumar Gingade
>> PutAllPRMessage.* These are internal APIs/message protocols used to handle PartitionedRegin messages. The messages are sent from originator node to peer nodes to operate on a given partitioned region; not intended as application APIs. We could consider, looking at the code, which determines bu

Re: Data ingestion with predefined buckets

2020-04-15 Thread steve mathew
Anil, yes its a kind of custom hash (which involves calculating hash on all fields of row). Have to stick to the predefined mechanism based on which source files are generated. It would be great help if some-one guide me about any available *server-side internal API that provides bucket level data

Re: Data ingestion with predefined buckets

2020-04-15 Thread Anilkumar Gingade
About api: I would not recommend using bucketId in api, as it is internal and there are other internal/external apis that rely on bucket id calculations; which could be compromised here. Instead of adding new APIs, probably looking at minimizing/reducing the time spent may be a good start. Bucket

Re: Data ingestion with predefined buckets

2020-04-15 Thread steve mathew
Thanks Den, Anil and Udo for your inputs. Extremely sorry for late rely as I took bit of time to explore and understand geode internals. It seems BucketRegion/Bucket terminology is not exposed to user but still i am trying to achieve something that is uncommon and for which client API is not expos

Re: Data ingestion with predefined buckets

2020-04-11 Thread Udo Kohlmeyer
Hi there Steve, Firstly, you are correct, the pattern you are describing is not recommended and possibly not even correctly supported. I've seen many implementations of Geode systems and none of them ever needed to do what you are intending to do. Seems like you are will to go through A LOT of

Re: Data ingestion with predefined buckets

2020-04-10 Thread Anilkumar Gingade
Did you look into:"StringPrefixPartitionResolver" which doesn't need custom implementation. https://geode.apache.org/docs/guide/111/developing/partitioned_regions/standard_custom_partitioning.html You can try key like - "key | file1" -Anil. On Fri, Apr 10, 2020 at 4:02 PM Dan Smith wrote: > H

Re: Data ingestion with predefined buckets

2020-04-10 Thread Dan Smith
Hi Steve, Well, you can technically use more than just the key in your partition resolver. You can also use a callback argument, something like the below code. This would put all of your data into bucket 0. The issue is that all operations will have to pass the callback argument, so if you need t

Re: Data ingestion with predefined buckets

2020-04-10 Thread steve mathew
Thanks Dan for your quick response. Though, This may not be a recommended pattern, Here, I am targeting a bucket specific putAll and want to exclude hashing as it turn out as an overhead in my scenario. Is this achievable...? How should I define a PartionResolver that works generically and returns

Re: Data ingestion with predefined buckets

2020-04-10 Thread Dan Smith
Hi Steve, The bucket that data goes into is generally determined by the key. So for example if your data in File-0 is all for customer X, you can include Customer X in your region key and implement a PartitionResolver that extracts the customer from your region key and returns it. Geode will then

Re: Data ingestion with predefined buckets

2020-04-10 Thread Anilkumar Gingade
Yes, you can use partition resolver to achieve this. You can also look into "StringPrefixPartitionResolver" which doesn't need custom implementation. https://geode.apache.org/docs/guide/111/developing/partitioned_regions/standard_custom_partitioning.html -Anil On Fri, Apr 10, 2020 at 11:08 AM st