On 20/12/2018 01:46, Chris Johns wrote:
I have read the ticket and not the patch.

On 20/12/2018 08:00, Joel Sherrill wrote:
How does this relate.to <http://relate.to> the existing infrastructure for the
capture engine? It seems duplicative.
Yes it is confusing. With this change we would have 3 possible trace systems,
the capture engine, rtems-tld and this. I would like just one way to trace a
live system.

This patch just adds per-processor ring buffers to record events. The capture engine does the same and in addition has support for triggers and filters. The rtems-tld generates trace events, it could use the stuff from this patch to record them. This was actually the original use case for this work which I started some months ago for the tracing GSoC project.


I am OK with the capture engine being replaced with something new and better
that provides all the functionality of the capture engine, ie triggers, floors
etc. The ticket does not state if this is what will happen. It would be good if
this can be clarified. Filtering at runtime is an important performance
consideration.

Event recording is a separate problem from filtering and triggering events. This patch set only addresses the recording. It tries to do this as efficient as possible to be able to record high frequency events. It intentionally does not deal with filtering and triggering. There is a user extension to generate thread events, but this is just a special user of this event recoding.


It is also not clear to me why the capture engine cannot be improved to have the
capture method presented here integrated. Sure, currently it does not have the
low level performance because of the locks used but it is not clear to me why
this code cannot be changed?

The locks could be removed, the real problem are the record items.

/**
 * @brief The 32-bit format record item.
 */
typedef struct {
  uint32_t event;
  uint32_t data;
} rtems_record_item_32;

/**
 * @brief A capture timestamp.
 *
 * This is a nanosecond capture timestamp
 */
typedef uint64_t rtems_capture_time;

/*
 * @brief Capture record.
 *
 * This is a record that is written into
 * the buffer. The events includes the priority of the task
 * at the time of the context switch.
 */
typedef struct rtems_capture_record
{
  size_t             size;
  uint32_t           events;
  rtems_id           task_id;
  rtems_capture_time time;
} RTEMS_PACKED rtems_capture_record;

So, this is 8 bytes vs. 20 bytes (this is 250% more and a big deal for high frequency events). This capture engine ring buffer is more complicated due to the support for variable length records. I uses memcpy() to copy the data into the record item. Please compare this with the rtems_record_produce() assembly code in the ticket.

Getting the timestamps in the capture engine is expensive. In this patch the CPU counter is used in combination with periodic uptime timestamps for synchronization. A 22-bit field is used for the timestamp. This is enough to a support 4GHz CPU counter with a 1000Hz clock tick, the T4240 has a CPU counter frequency of 1.5GHz.


Can we please establish the requirements and how this sits with what we have
sorted out before reviewing patches and details?

My use case for this is a nasty bug. We have about 50 nodes running an RTEMS application connected via Ethernet. The nodes operate 24/7. Approximately every 60 days one of the nodes crashes and this leads to a 20 seconds service interruption. Nobody was harmed so far, but it is still an issue which needs to be fixed. I need a fast (faster than the capture engine) event recording mechanism for post-mortem analysis.

The basic requirement is recording of high frequency events on high end embedded SMP machines, e.g. 1.5GHz T4240 with 24 processors and 10Gbit/s Ethernet.


An important part of any trace work is host integration and the tools to manage
this. Getting data off via TCP, UDP, file etc is only part of the problem so it
would good to know what the integration path on the host is.

The patch set includes a TCP client and server, see test case record01. Attached is a simple command line tool which I used on Linux to get the event records (32-bit big endian RTEMS machine, 64-bit little endian development machine). Some files of this patch are intended to be compiled on the development machine.


Sorry but I have no time to review this and consider it until next year.

No problem, take your time. I work on this since April this year from time to time, so it can wait a couple of more weeks.

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

/*
 * SPDX-License-Identifier: BSD-2-Clause
 *
 * Copyright (C) 2018 embedded brains GmbH
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 * POSSIBILITY OF SUCH DAMAGE.
 */

#include <rtems/recorddata.h>
#include <rtems/recordclient.h>

#include <sys/socket.h>

#include <assert.h>
#include <getopt.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <inttypes.h>

#include <netinet/in.h>
#include <arpa/inet.h>

static const struct option longopts[] = {
  { "help", 0, NULL, 'h' },
  { "host", 1, NULL, 'H' },
  { "port", 1, NULL, 'p' },
  { NULL, 0, NULL, 0 }
};

static const char *host = "127.0.0.1";

static uint16_t port = 1234;

static void usage( char **argv )
{
  printf(
    "%s --host=HOST --port=PORT\n"
    "\n"
    "Mandatory arguments to long options are mandatory for short options too.\n"
    "  -h, --help                 print this help text\n"
    "  -H, --host=HOST            the host IPv4 address of the record server\n"
    "  -p, --port=PORT            the TCP port of the record server\n",
    argv[ 0 ]
  );
}

static int connect_client( void )
{
  struct sockaddr_in in_addr;
  int fd;
  int rv;

  fd = socket( PF_INET, SOCK_STREAM, 0 );
  assert( fd >= 0 );

  memset( &in_addr, 0, sizeof( in_addr ) );
  in_addr.sin_family = AF_INET;
  in_addr.sin_port = htons( port );
  in_addr.sin_addr.s_addr = inet_addr( host );
  rv = connect( fd, (struct sockaddr *) &in_addr, sizeof( in_addr ) );
  assert( rv == 0 );

  return fd;
}

static rtems_record_client_status handler(
  uint32_t            seconds,
  uint32_t            nanoseconds,
  uint32_t            cpu,
  rtems_record_event  event,
  uint64_t            data,
  void               *arg
)
{
  const char *event_text;

  (void) arg;

  if (
    event == RTEMS_RECORD_UPTIME_LOW
      || event == RTEMS_RECORD_UPTIME_HIGH
  ) {
    return RTEMS_RECORD_CLIENT_SUCCESS;
  }

  if ( seconds != 0 && nanoseconds != 0 ) {
    printf( "%" PRIu32 ".%09" PRIu32 ":", seconds, nanoseconds );
  } else {
    printf( "*:" );
  }

  event_text = rtems_record_event_text( event );
  if ( event_text != NULL ) {
    printf( "%" PRIu32 ":%s:%" PRIx64 "\n", cpu, event_text, data );
  } else {
    printf( "%" PRIu32 ":%i:%" PRIx64 "\n", cpu, event, data );
  }

  return RTEMS_RECORD_CLIENT_SUCCESS;
}

int main( int argc, char **argv )
{
  rtems_record_client_context ctx;
  int fd;
  int rv;
  int opt;
  int longindex;

  while (
    (opt = getopt_long(argc, argv, "hH:p:", &longopts[0], &longindex)) != -1
  ) {
    switch (opt) {
      case 'h':
        usage(argv);
        exit(EXIT_SUCCESS);
        break;
      case 'H':
        host = optarg;
        break;
      case 'p':
        port = (uint16_t) strtoul(optarg, NULL, 10);
        break;
      default:
        exit(EXIT_FAILURE);
        break;
    }
  }

  fd = connect_client();
  rtems_record_client_init( &ctx, handler, NULL );

  while ( true ) {
    int buf[ 1024 ];
    ssize_t n;

    n = recv( fd, buf, sizeof( buf ), 0 );
    if ( n >= 0 ) {
      rtems_record_client_run( &ctx, buf, (size_t) n );
    } else {
      break;
    }
  }

  rv = close( fd );
  assert( rv == 0 );

  return 0;
}
_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Reply via email to