We have thought very carefully when introducing type-erasures, including considering the concerns you raised, and never-the-less have made the decision that resulted in the current design, which strikes the balance of type-erasure and typing. The original intention of the current design is to strike a balance between the need for typing and the dynamism. Type erasure brings the benefit of pluggable attributes(you don't have to enumerate them beforehand), and as a result the pluggable operator and pass optimization system. The fact that it is map<str, any> where any can be vector<T> represents the need to use typed data structures vector<T> when necessary, but only have to fetch once. Note that we did not go as far to vector<any>, so that most part of processing can be typed.
The advantage of map<str, any> enables the unified processing of the designs. You can find the same design in cases like DataFrame. To inspect a dynamic type, likely you can create an auxiliary function to do so, and many of them are still one-liner. or just use ```DLOG(INFO) << PrintInfo(graph, fields)``` On the other hand, having a separate attribute field means we need separate logic for serialization and manipulation of these data structures. The general benefit of type-erasure is to gain the dynamism and possible ability of backend registration. Why I totally understand where you are coming from. We do need to again look at both sides of the spectrum. The data structure design like any infra design is about trade-offs. To sum up, I certainly agree some level of strong typing can be helpful and we did do that in our codebase. I do think it is wrong to dismiss dynamic types as simply bad and "not for production". Dynamic types, when used properly, can simplify the code assumptions, make the system more extensible/pluggable, and interpolable with frontends. All these features are important and necessary for a successful deep learning system. Admittedly they might make life a bit harder(e.g. not being able to auto-complete using your IDE), but again this is a tradeoff we need to make. I would recommend taking a look at the example of DLPack again https://github.com/dmlc/dlpack/blob/master/include/dlpack/dlpack.h as a good example of how such balance can be achieved. Tianqi On Tue, Jun 11, 2019 at 1:26 PM Pedro Larroy <[email protected]> wrote: > Another data point. While working with a contributor, he/she is asking > to get access to the graph and values of the NDArray (me too) to be > able to reason more effectively about an enhacements to the operators: > > https://github.com/apache/incubator-mxnet/pull/15120 > > I think gathering in the wiki design constraints with respect to > different activities and design proposals using the graph and pain > points as the one we are discussing would be a constructive way to > move forward, unless you think everything is as good as it gets right > now which is what I understand from your responses. > > Re the shape of vector I know is one line of code to get it, but you > can't get the values with a debugger from other points of the code, > and that compounds to many other dynamic attributes that you can also > fetch, it's a bit like dying from a thousand cuts. In the MXNet > codebase we are paying a price for no benefit on flexibility as we > always use those attributes, hence my point they should be typed in > the graph. Please try to understand my point and help provide > solutions or at least a reason why it can't be improved instead of > saying there's no problem, there's still three problems in order of > importance: debuggability, clarity of definition of data structures > and navigability with an IDE which breaks with an untyped field keyed > with string. > > Pedro. > > > > Pedro. > > On Tue, Jun 11, 2019 at 11:07 AM Pedro Larroy > <[email protected]> wrote: > > > > To put a recent specific example and focus the discussion (there are > > many as there are attributes), the shapes in the graph are a vector of > > Shape set as an attribute using dmlc::any so this makes it very > > difficult to debug the shapes when you have a graph object. I would > > have it as a typed attribute to the graph, as we always need the > > vector of shapes and operate on it while doing shape inference. > > > > On Tue, Jun 11, 2019 at 10:18 AM Pedro Larroy > > <[email protected]> wrote: > > > > > > Thanks for the good discussion. > > > > > > I actually wasn't referring particularly to our conversations in > > > github with respect of the refactors, but it's nice from you to bring > > > them up. And it's ok to disagree in small things, hopefully we can > > > align in the big ones. > > > > > > I understand that for TVM you might have different constraints with > > > how dynamic you want to be for mutating the graph and doing quick > > > experimentation and research but please try to understand my > > > perspectives coming from a software engineering background and helping > > > maintain MXNet for thousands of users and teams using it in > > > production. Let's also consider how many issues we have open and our > > > bandwidth to deal with additional complexity. > > > > > > To your pros and cons I would like to add and emphasize that currently > > > the heavy use of dynamic attributes in the graph using dmlc::any has > > > two very negative consequences, at least for MXNet: > > > > > > 1 - Makes the data structures using dmlc::any almost impossible to > > > debug, as they are just binary. > > > 2 - Makes the code more difficult to understand because there's no > > > declaration in a data structure of the data fields it uses and its > > > responsibilities. We are basically shoving all kinds of stuff using > > > dmlc::any. > > > 3 - You get no help from the IDE to navigate and refactor as another > > > consequence. > > > > > > I would really like you to give me solutions to these problems or at > > > least acknowledge them and tell me why do we have to pay those > > > tradeoffs instead of just dismissing them as engineering taste. > > > > > > The more I work with MXNet the more I wish debugging was easier, and > > > reading and refactoring the code, and those fields would be declared > > > and typed in their corresponding data structures, for MXNet I don't > > > think this would affect anything in regards the python bindings since > > > they go through the typed C API anyway. > > > > > > Maybe we can get some inspiration from LLVM as they have bindings for > > > many languages to work with the AST and have very clean APIs for the > > > compilation steps. It's also OK to have an initial dynamic codebase > > > for research and experimentation and then "cure" them into a solid > > > maintainable one with more types and more robust... > > > > > > Pedro. > > > > > > > > > > > > > > > > > > On Fri, May 31, 2019 at 9:31 AM Tianqi Chen <[email protected]> > wrote: > > > > > > > > A good infrastructure design has a long way to go and has a profound > impact on the project itself. That is why we always want to rethink if the > interface can be better done, and think about the next possible > infrastructure to make things better, Refactoring is certainly part of it. > > > > > > > > There are usually two types of refactoring we refers to : > > > > 1) The major design change, in terms of class relations, data > structures (e.g. numpy support, adding compilation to new hardware) > > > > 2) The specific choice of API, programming style(more types or > type-erased program) > > > > > > > > (1) affects the long term support of the project, introduces new > features if necessary and need a lot of thoughts into that. I believe the > general IR, compilation and numpy support belongs to that category. > > > > > > > > I would particularly like to talk about (2). > > > > Because there is no unified correct answer in software engineering, > different developers may prefer different views on a certain problem. > > > > Some of them have things to do with the taste developers. The change > could favor certain aspect of the project, but not necessarily another part. > > > > Refactoring wrt these sometimes does require a more thoughtful > conversation and make a reasonable compromise. > > > > > > > > For example, we have a recent discussion about whether to introduce > more typing into the code base, to the extent that the base data structure > could be templatized. > > > > - The Pros of this approach > > > > - It introduces more typing and compile-time error > message(instead of runtime checking), could help developers to find problem > earlier. > > > > - The Cons of the approach: > > > > - Having a template in the base data structure causes ABI > problem(which code generated by DLL A vs DLL B) and will have potential > future issues. > > > > - Template sometimes confuses some developers. > > > > - For serialization, it is hard to anticipate all kinds of > classes and it is easier to have one class(any) that handles polymorphism. > > > > - Because of most frontends(python) are dynamic, it is easier to > interface them with a type-erased API. > > > > > > > > As we can see there are pros and cons of bringing in more typing to > the change, and there is no unified answer. > > > > One good example of a nice infrastructure design trade-off is DLPack > https://github.com/dmlc/dlpack/blob/master/include/dlpack/dlpack.h > > > > This is a base data structure adopted by MXNet, Pytorch, Chainer, > and many other frameworks unanimously. > > > > It is a type-erased data structure that erases the data type, and > memory allocator from the data structure and is designed to exchange > tensor(coming from different memory allocators) across DLL boundaries. > > > > As you can see this is a good example of type-erased data structures. > > > > > > > > When we are having this kind of questions. It is important to have a > good conversation. Sometimes we have to make tradeoffs rather than bend > everyone-else to our will. This is what open source is about. > > > > I would also like to give some examples of conversations and how > design decisions are resolved. It comes from the TVM community's recent > discussion about VM design. > > > > I directly paste the github issue conversation here for the sake of > clarity(note that all the conversations are also mirrored to dev@tvm). > > > > The background is that the community want to bring a virtual machine > that can execute dynamic operations more effectively. > > > > > > > > - The initial proposal, made by one of the committers gave a > detailed design based on Stack VM https://github.com/dmlc/tvm/issues/2810 > > > > - As you can see that there are quite some discussions about > whether we want to use a different set of design, in this case, a > register-based version. > > > > - The conversation evolves, and while the community members > disagree on some cases, also agrees with each other on the particular > tradeoffs. > > > > - After some discussions, the committers bring a tradeoff design > that tries to consolidate the needs of both sides and this is the final > solution being adopted https://github.com/dmlc/tvm/issues/2915 > > > > I would like to particularly highlight the fact that: 1) there are > disagreements in the development process. 2) developers work together to > understand each others' needs and then make consensus on a perhaps better > design. > > > > > > > > There are two other particular conversations between Pedro and > myself, which are during his contributions. > > > > - https://github.com/dmlc/tvm/pull/3037 In this case, I raised the > concern about API consistency, and Pedro brings up a reason why he thinks > it is a better idea, I agreed and we merged the PR > > > > - https://github.com/dmlc/tvm/pull/3108 In this other case, there > are technical reasons for going both sides for the case of MXNet, we have > listed pros/cons about both sides and have a constructive conversation. > Eventually, I decided to not merge the PR after weighing in all the cases. > > > > > > > > I believe both are useful conversations, and while Pedro and I > disagree sometimes, we do agree on many other cases. The most crucial part > is about having a constructive conversation. > > > > To summarize, I do refactoring and making things better is certainly > important to make the project better. And I do believe it is crucial to > think about all the technical consequences and make good tradeoff decisions. > > > > Sometimes the decision may not make every developer mostly happy, > but a good technical compromise could move the project forward and help the > community in general. > > > > > > > > Tianqi > > > > > > > > On Fri, May 31, 2019 at 6:26 AM Isabel Drost-Fromm < > [email protected]> wrote: > > > >> > > > >> > > > >> > > > >> Am 31. Mai 2019 14:13:30 MESZ schrieb Pedro Larroy < > [email protected]>: > > > >> > I think Martin does a very good job explaining why > > > >> >refactoring, > > > >> >reducing developer frustration and internal improvement is a > crucial > > > >> >productivity multiplier which includes lower cost to ship features, > > > >> >less > > > >> >bugs and time spent debugging. > > > >> > > > >> There's one aspect that's special for open source projects: if a > project wants to survive long term, it should make it easy for people to > get started working on the project. In my experience, refactoring and > cleanup play an important role in that. So thanks also for making > recruiting of new contributers better. > > > >> > > > >> Isabel > > > >> -- > > > >> This message was sent with K-9 from a mobile device with swipe to > type enabled. I'm sorry for any embarrassing typos that slipped through. >
