[Lldb-commits] [PATCH] D56322: [Reproducers] SBReproducer framework

Pavel Labath via Phabricator via lldb-commits Fri, 01 Feb 2019 04:29:24 -0800

labath added a comment.

I have a lot of comments. The two major ones are:


- i think the way you link the tests is in the UB territory. I explain this in 
detail in one of the inline comments.
- I believe that your unit tests (not just in this patch) focus too much on 
testing the behavior of a single method, even when that method does not have an 
easily observable results. For this you then often need to expose internals of 
the tested class to be able to test some effect of that method on the internal 
state. This not all bad (i've done it myself sometimes), and  it's definitely 
better that not writing any unit tests. However, it's generally better to test 
the public interface of a method/class/entity, and in this case, I believe it 
should be possible. I'm imagining some tests like having a dummy class with a 
bunch of methods which are annotated with the SB_RECORD macros. Then, in the 
test you call the methods of this class with some arguments, have the repro 
framework serialize the calls to a (in-memory?) stream, and verify the contents 
of that stream. This will test a lot more code  -- the (De)Serialize functions 
are just fancy ways of invoking memcpy, but if you take all of that together 
with the SB_RECORD, SB_REGISTER macros, it becomes some pretty deep magic that 
is interesting to test exhaustively (it's fine, but not really interesting to 
test that Deserialize<T*> and Deserialize<T&> do the right thing for each of 
the possible fundamental types.) And if you use the deserializer interface to 
read out the recorded traces (instead of comparing raw bytes), then you can 
avoid depending on the endianness of the values.

Conversely, you can write a trace file with the serializer interface, and then 
tell the replayer to invoke the mock class which will verify that it was called 
with the right arguments.



================
Comment at: source/API/SBReproducer.cpp:1
+//===-- SBReproducer.cpp ----------------------------------------*- C++ 
-*-===//
+//
----------------
I found it weird to have one cpp file implementing two headers 
(`SBReproducer.h` and `SBReproducerPrivate.h`). Can you split it into two 
files? (This will come out naturally, if we split this up into modules as I 
mention in one of the other comments.)


================
Comment at: source/API/SBReproducer.cpp:69
+  repro::Loader *loader = repro::Reproducer::Instance().GetLoader();
+  if (!loader) {
+    return false;
----------------
drop `{}` (that's what you seem to do elsewhere in this file).


================
Comment at: source/API/SBReproducerPrivate.h:44
+    void *object = GetObjectForIndexImpl(idx);
+    return static_cast<typename std::remove_const<T>::type *>(object);
+  }
----------------
Why do you need to `remove_const` here? If `T` is `const`, then const will be 
added back by the return statement anyway. If `T` is not `const`, then 
`remove_const` is a noop.


================
Comment at: source/API/SBReproducerPrivate.h:65-69
+    auto it = m_mapping.find(idx);
+    if (it == m_mapping.end()) {
+      return nullptr;
+    }
+    return m_mapping[idx];
----------------
replace by `m_mapping.lookup(idx)`, at which point you can consider dropping 
this function entirely.


================
Comment at: source/API/SBReproducerPrivate.h:82-93
+/// Base class for tag dispatch used in the SBDeserializer. Different tags are
+/// instantiated with different values.
+template <unsigned> struct SBTag {};
+
+/// We need to differentiate between pointers to fundamental and
+/// non-fundamental types. See the corresponding SBDeserializer::Read method
+/// for the reason why.
----------------
Just curious: What's the advantage of this over just declaring a bunch of Tag 
classes directly (struct PointerTag{}; struct ReferenceTag{}; ...)?


================
Comment at: source/API/SBReproducerPrivate.h:121
+public:
+  SBDeserializer(llvm::StringRef buffer = {}) : m_buffer(buffer), m_offset(0) 
{}
+
----------------
Instead of the m_offset variable, what do you think about just `drop_front`ing 
the appropriate amount of bytes when you are done with the? It doesn't look 
like you'll ever need to go back to an earlier point in the stream...


================
Comment at: source/API/SBReproducerPrivate.h:155-161
+  // FIXME: We have references to this instance stored in replayer instance. We
+  // should find a better way to swap out the buffer after this instance has
+  // been created, but his will have to do for now.
+  void LoadBuffer(llvm::StringRef buffer) {
+    m_buffer = buffer;
+    m_offset = 0;
+  }
----------------
It sounds to me like this could be solved by passing the deserializer to the 
`operator()` of the SBReplayers instead of having it a member variable set by 
the constructor. This design also makes more sense to me, as theoretically 
there is no reason why all replay calls would have to come from the same 
deserializer object. You may even want to have a separate deserializer for each 
thread when you get around to replaying multithreaded recordings.

After this, it may even be possible to make the deserializer object be a local 
variable in the `SBRegistry::Replay` function.


================
Comment at: source/API/SBReproducerPrivate.h:167
+private:
+  template <typename T> T Read(ValueTag) {
+    T t;
----------------
I guess if we don't consider the recording to be "input" then we don't have to 
handle the case of not having enough bytes in the file extremely gracefully 
here, but it would still be nice to at least assert that we are not running off 
the end of the buffer here.


================
Comment at: source/API/SBReproducerPrivate.h:169
+    T t;
+    std::memcpy((char *)&t, &m_buffer.data()[m_offset], sizeof(t));
+    m_offset += sizeof(t);
----------------
reinterpret_cast


================
Comment at: source/API/SBReproducerPrivate.h:182
+    typedef typename std::remove_reference<T>::type UnderlyingT;
+    // If this is a reference to a fundamental type we just read its value.
+    return *m_index_to_object.template GetObjectForIndex<UnderlyingT>(
----------------
Is this correct? I would expect the fundamental references to not go through 
this overload at all...


================
Comment at: source/API/SBReproducerPrivate.h:303
+        DeserializationHelper<Args...>::template deserialized<Result>::doit(
+            m_deserializer, f));
+  }
----------------
Shouldn't this call `g` then?


================
Comment at: source/API/SBReproducerPrivate.h:307-309
+  Result (*f)(Args...);
+  /// A custom function.
+  Result (*g)(Args...);
----------------
If you gave these more meaningful names, then the comments would be unneeded.


================
Comment at: source/API/SBReproducerPrivate.h:370
+  /// Register the given replayer for a function (and the ID mapping).
+  void DoRegister(uintptr_t RunID, SBReplayer *replayer, unsigned id) {
+    m_sbreplayers[RunID] = std::make_pair(replayer, id);
----------------
It looks like this `id` argument isn't really needed here, as the DoRegister 
function can just synthesize the id on its own. In fact, the `m_id` is probably 
not needed either, as you can just use `m_sbreplayers.size()+1` or something.


================
Comment at: source/API/SBReproducerPrivate.h:371
+  void DoRegister(uintptr_t RunID, SBReplayer *replayer, unsigned id) {
+    m_sbreplayers[RunID] = std::make_pair(replayer, id);
+    m_ids[id] = replayer;
----------------
`assert(m_sbreplayers.find(RunID) == end())` to make sure our RunID magic 
didn't do something funny?


================
Comment at: source/API/SBReproducerPrivate.h:419-448
+/// Maps an object to an index for serialization. Indices are unique and
+/// incremented for every new object.
+///
+/// Indices start at 1 in order to differentiate with an invalid index (0) in
+/// the serialized buffer.
+class SBObjectToIndex {
+public:
----------------
What's the threading scenario in which you expect this to be called? If 
multiple threads can call `GetIndexForObject` simultaneously (even if the 
objects differ), then I think you need to guard `m_mapping` with the mutex too, 
not just the integer variable.

BTW: The `m_index` can also just be `m_mapping.size()+1`


================
Comment at: source/API/SBReproducerPrivate.h:453
+public:
+  SBSerializer(llvm::raw_ostream &stream = llvm::outs()) : m_stream(stream) {}
+
----------------
Is this default value used for anything? It don't see it being used, and it 
doesn't seem particularly useful anyway, as you'd just get a lot of binary junk 
on stdout.


================
Comment at: source/API/SBReproducerPrivate.h:482
+    if (std::is_fundamental<T>::value) {
+      Serialize(t);
+    } else {
----------------
Will this ever not be an infinite recursion? I am guessing this works because 
all fundamental types are picked up by the specific overloads below before this 
function is even invoked. 

Maybe if you just inline the `m_stream.write(reinterpret_cast<const char 
*>(&t), sizeof(Type))` thingy here, you can avoid having to define a special 
Serialize function for all fundamental types.


================
Comment at: source/API/SBReproducerPrivate.h:484
+    } else {
+      int idx = m_tracker.GetIndexForObject(&t);
+      Serialize(idx);
----------------
It looks like `GetIndexForObject` returns `unsigned`


================
Comment at: source/API/SBReproducerPrivate.h:562-563
+        m_local_boundary(false), m_result_recorded(false) {
+    if (!g_global_boundary) {
+      g_global_boundary = true;
+      m_local_boundary = true;
----------------
If you wanted to make the updates to g_global_boundary thread-safe, this is not 
the way to achieve that. This is still going to be racy, so you should at least 
use the atomic compare-and-exchange operations. However, I don't believe this 
will still be right thing to do in the multithreaded case, so it may be best do 
just drop the atomicity until you implement proper threading support.


================
Comment at: source/API/SBReproducerPrivate.h:573-574
+
+  void SetSerializer(SBSerializer &serializer) { m_serializer = &serializer; }
+
+  /// Records a single function call.
----------------
Move this to the constructor, and use `llvm::Optional` in the `SB_RECORD` 
macros?

This will also avoid having this class muck around with `g_global_boundary` 
when we are not recording.


================
Comment at: source/API/SBReproducerPrivate.h:578
+  void Record(Result (*f)(FArgs...), const RArgs &... args) {
+    if (!ShouldCapture()) {
+      return;
----------------
Inconsistent braces around a single statement.


================
Comment at: source/API/SBReproducerPrivate.h:584-585
+
+    if (Log *log = GetLogIfAllCategoriesSet(LIBLLDB_LOG_API))
+      log->Printf("Recording #%u '%s'", id, m_pretty_func.data());
+
----------------
if you used the `LLDB_LOG` macro, you wouldn't have to rely on m_pretty_func 
being implicitly null terminated.


================
Comment at: source/API/SBReproducerPrivate.h:618-626
+  void RecordOmittedResult() {
+    if (m_result_recorded)
+      return;
+    if (!ShouldCapture())
+      return;
+
+    m_serializer->SerializeAll(0);
----------------
I'm wondering if we shouldn't embed some additional safeties here to make sure 
the result is always recorded when it should be. It seems to be this class 
should know whether it should expect an explicit `RecordResult` call (based on 
whether we called the `void` version of `Record` or not). So, we could have 
this class assert if we reached the destructor without recording anything, 
instead of silently recording zero. WDYT?


================
Comment at: source/API/SBReproducerPrivate.h:659
+  if (auto *g = lldb_private::repro::Reproducer::Instance().GetGenerator()) {  
\
+    lldb_private::repro::SBRecorder sb_recorder(__PRETTY_FUNCTION__);          
\
+    sb_recorder.SetSerializer(                                                 
\
----------------
`__PRETTY_FUNCTION__` is not portable. I think you want LLVM_PRETTY_FUNCTION 
here.


================
Comment at: tools/driver/Driver.cpp:913-917
+  SBReproducer reproducer;
+  if (reproducer.Replay()) {
+    return 0;
+  }
+
----------------
The driver should know whether it is replaying things or not, right? Can we 
have it call `Replay` only if replay mode has actually been requested?


================
Comment at: unittests/API/CMakeLists.txt:4-12
+  LINK_LIBS
+    liblldb
+    lldbCore
+    lldbHost
+  LINK_COMPONENTS
+    Support
+  )
----------------
Hmm.. I'm not sure this is going to work everywhere, but it's certainly going 
to be weird. `liblldb` is a shared library which deliberately firewalls all 
`lldb_private` symbols. So any reproducer symbols you defined there will not be 
accessible to your unit tests. I guess the reason `include_directories` trick 
kind of works is that most of the reproducer frameworks is implemented in the 
header, and including that header from the test will cause another copy of all 
of these symbols to be defined. However, I am still surprised that it does 
work, as you still have some chunks of this implemented in the cpp file.  I 
guess you were just lucky to not need those symbols in the tests that you have 
written. Nonetheless, this is going into very murky waters, and I think the 
fact that you had to link in  lldbHost and lldbCore in addition to liblldb 
(even though they are be included in there) demonstrates it.

I think the best way to address this is to move the core of the repro engine 
into some other library, which can be safely used from unit tests. After all, 
the core is not really tied to the SB API, and you could easily use it to 
record any other interface, if you needed to do that for whatever reason. I 
think this should go into the Utility module, as it doesn't have any other 
dependencies. I see that you're including FileSystem.h, but it looks like this 
is only used in the `SetInputFileHandleRedirect` function, which is not 
actually used anywhere! Even when it is used, I think it would make sense for 
it to live elsewhere, as this is not a part of the replay core, but rather how 
a particular api wants to replay a particular function signature.


================
Comment at: unittests/API/SBReproducerTest.cpp:37-41
+      0x01, 0x00, 0x61, 0xcd, 0xcc, 0x8c, 0x3f, 0x02, 0x00, 0x00, 0x00,
+      0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00,
+      0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x62, 0x06, 0x00, 0x00,
+      0x00, 0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00,
+      0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x09, 0x00};
----------------
All of these buffers hard-code endiannes of integers and the representation of 
floats.


================
Comment at: unittests/API/SBReproducerTest.cpp:83-85
+    SBDeserializer deserializer("a");
+    EXPECT_TRUE(deserializer.HasData());
+    EXPECT_FALSE(deserializer.HasData(1));
----------------
I find this behaviour confusing. Intuitively, I'd expect `HasData(n)` if buffer 
has at least `n` bytes of data left. So in this case I'd expect `true` in both 
calls. (In fact I'd probably expect `HasData()` to be a synonym for 
`HasData(1)`). It looks like you're only calling `HasData` in one place, and 
you're hardcoding `1` there, so maybe you just want an `EOF()` function?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56322/new/

https://reviews.llvm.org/D56322



_______________________________________________
lldb-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D56322: [Reproducers] SBReproducer framework

Reply via email to