I've thought about this some more, and I'm changing my stance with respect to
ProtoBuf. While adding a Python class schema is a less invasive change than
introducing ProtoBuf and allows us to stick to the current log format exactly,
protos do have the added benefit of being language-neutral. Also, it will also
be likely moving forward that sticking to "industry standard" practices (as
@mdw-octoml indicated) will enable even more clarity around schema changes, and
enforce to some extent more backwards compatibility than we've seen so far.
To that end, here is a resummarization of the proposed schema in .proto.
Comments are left for modifications. Note this will certainly require an update
from 0.2 -> 0.3 schema format and implementation details may change slightly. I
would also send a PR to tophub accordingly if people agree to this change.
```go
syntax = "proto3";
package autotvm.log;
import "google/protobuf/any.proto";
message Target {
// For now this is the string representation of a target; e.g. "llvm
-mcpu=broadwell"
// This should be replaced once the rfc "TVM Target specification" is
finalized
string target_string = 1;
}
message AutoTVMLog {
Target target = 1;
Workload workload = 2;
Config config = 3;
Result result = 4;
string version = 5;
string tvm_version = 6;
}
message Workload {
string task_name = 1;
repeated Argument args = 2;
// kwargs is no longer included as it is unused
}
message Argument {
oneof arg {
Tensor tensor = 1;
// Possible tuple values are not well specified and may require more
sorting out
//
https://github.com/apache/incubator-tvm/blob/master/python/tvm/autotvm/task/task.py#L43-L63
Tuple tuple = 2;
string value = 3;
}
}
message Tensor {
string name = 1;
repeated uint32 shape = 2;
string dtype = 3;
}
message Tuple {
repeated google.protobuf.Any values = 1;
}
message Config {
string code_hash = 1;
repeated Entity entities = 2;
uint32 index = 3;
}
message Entity {
// Entities are previously output as `[["tile_ow", "sp", [-1, 1]],
<other_entities>]`
// The proposed encoding clarifies entity type in the schema itself instead
of as a string
string knob_name = 1;
oneof entity {
SplitEntity split = 2;
ReorderEntity reorder = 3;
AnnotateEntity annotate = 4;
OtherOptionEntity other_option = 5;
}
}
message SplitEntity {
repeated int32 size = 1;
}
message ReorderEntity {
repeated uint32 order = 1;
}
message AnnotateEntity {
repeated string annotations = 1;
}
message OtherOptionEntity {
google.protobuf.Any value = 1;
}
message Result {
repeated float costs = 1;
int32 error_no = 2;
float all_cost = 3;
float timestamp = 4;
}
```
As an example, the json will look like
```
{
"target": {
"target_string": "llvm -mcpu=broadwell"
},
"workload": {
"task_name": "conv2d_x86_64",
"args": [{"tensor": {"name": "tensor_name","shape": [1,2,3],"dtype":
"float32"}}]
},
"config": {
"code_hash": "codehashtest",
"entities": [{"knob_name": "tile_ic","split": {"size": [4,32]}}],
"index": 1
},
"version": "0.3",
"tvm_version": "todo get tvm version"
}
```
To avoid breaking workflows that assume readable log output by default, I
suggest we simply add "protobuf" as an encode/decode/file logging option in
https://github.com/apache/incubator-tvm/blob/master/python/tvm/autotvm/record.py.
The default serialization format will still be "json", but all serialization
schemes will be backed with the proto-generated schema. @haichen @jroesch
@tqchen what do you think?
---
[Visit
Topic](https://discuss.tvm.ai/t/rfc-canonicalizing-autotvm-log-format/7038/10)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/unsubscribe/dd28663cf3937123e25c74ee3c683e3a22e58bbd64c6511bec558085959d9d5c).