On Wed, May 27, 2020 at 8:29 PM Gedare Bloom <[email protected]> wrote:
> On Tue, May 26, 2020 at 6:12 PM Utkarsh Rai <[email protected]> > wrote: > > > > > > > > On Mon, May 25, 2020 at 9:32 PM Gedare Bloom <[email protected]> wrote: > >> > >> On Mon, May 25, 2020 at 5:39 AM Utkarsh Rai <[email protected]> > wrote: > >> > > >> > > >> > On Fri, May 22, 2020, at 10:59 AM Gedare Bloom <[email protected]> > wrote: > >> >> > >> >> > This means that our low-level design for providing thread stack > protection may look something like this:- > >> >> > > >> >> > 1. For MPU based processors, the number of protected stacks will > depend on the number of protection domains i.e. for MPUs with 8 protection > domains we can have 7 protected stacks ( 1 of the region will be assigned > for global data). For MMU based system we will have a section (a page of > size 1MB) for global data and task address space will be divided into > smaller pages, page sizes will be decided by keeping in mind the number of > TLB entries, in a manner I have described above in the thread. > >> >> > > >> >> There is value to defining a few of the global regions. I'll assume > >> >> R/W/X permissions. Then code (.text) should be R/X. read-only data > >> >> sections should be grouped together and made R. Data sections should > >> >> be RW. And then stacks should be added to the end. The linker scripts > >> >> should be used to group the related sections together. I think some > >> >> ARM BSPs do some of this already. That seems like a minimally useful > >> >> configuration for most users that would care, they want to have also > >> >> protection of code from accidental overwrite, and probably data too, > >> >> and non-executable data in general. You also may have to consider a > >> >> few more permission complications (shared/cacheable) depending on the > >> >> hardware. > >> > > >> > > >> > The low-level mmu implementation for ARMv7 BSPS has an > 'ARMV7_CP15_START_DEFAULT_SECTIONS' which lists out various regions with > appropriate permissions and then are grouped by a linker script. This > should be the standard way of handling the placement of statically > allocated regions. > >> > > >> >> > 2. The protection, size, page table, and sharing attributes of > each created thread will be tracked. > >> >> > > >> >> I'd rather we not be calling this a page table. MPU-based systems > >> >> don't have a notion of page table. But maybe it is OK as long as we > >> >> understand that you mean the data structure responsible for mapping > >> >> out the address space. I'm not sure what you mean by size, unless you > >> >> refer to that thread's stack. > >> >> > >> >> > 3. At every context switch, these attributes will be updated, the > static-global regions will be assigned a global ASID and will not change > during the switch only the protected regions will be updated. > >> >> > > >> >> Yes, assuming the hardware supports ASIDs and a global attribute. > >> >> > >> >> I don't know if you will be able to pin the global entries in > >> >> hardware. You'll want to keep an eye out for that. If not, you might > >> >> need to do something in software to ensure they don't get evicted > >> >> (e.g., touch them all before finishing a context switch assuming LRU > >> >> replacement). > >> >> > >> >> > 4. Whenever we share stacks, the page table entries of the shared > stack, with the access bits as specified by the mmap/shm high-level APIs > will be installed to the current thread. This is different from simply > providing the page table base address of the shared thread-stack ( what if > the user wants to make the shared thread only readable from another thread > while the 'original' thread is r/w enabled?) We will also have to update > the TLB by installing the shared regions while the global regions remain > untouched. > >> >> > > >> >> > >> >> Correct. I think we need to make a design decision whether a stack > can > >> >> exceed one page. It will simplify things if we can assume that, but > it > >> >> may limit applications unnecessarily. Have to think on that. > >> > > >> > > >> > If we go with the above assumption, we will need to increase the size > of the page i.e. pages of 16Kib or 64Kib. Most of the applications won't > require stacks of this size and will result in wasted memory for each > thread. I think it would be better if we have multiple pages, as most of > the applications will have stacks that may fit in a single 4KiB page anyway. > >> > > >> > >> I mis-typed. I meant I think we can assume stacks fit in one page. It > >> would be impossible to deal with otherwise. > >> > >> >> > >> >> The "page table base address" points to the entire structure that > maps > >> >> out a thread's address space, so you'd have to walk it to find the > >> >> entry/entries for its stack. So, definitely not something you'd want > >> >> to do. > >> >> > >> >> The shm/mmap should convey the privileges to the requesting thread > >> >> asking to share. This will result in adding the shared entry/entries > >> >> to that thread's address space, with the appropriately set > >> >> permissions. So, if the entry is created with read-only permission, > >> >> then that is how the thread will be sharing. The original thread's > >> >> entry should not be modified by the addition of an entry in another > >> >> thread for the same memory region. > >> >> > >> >> I lean toward thinking it is better to always pay for the TLB miss at > >> >> the context switch, which might mean synthesizing accesses to the > >> >> entries that might have been evicted in case hardware restricts the > >> >> ability of sw to install/manipulate TLB entries directly. That is > >> >> something worth looking at more though. There is definitely a > tradeoff > >> >> between predictable costs and throughput performance. It might be > >> >> worth implementing both approaches. > >> >> > >> >> Gedare > >> > > >> > > >> > We also need to consider the cases where the stack sharing would be > necessary- > >> > > >> > - We can have explicit cases where an application gets the attributes > of a thread by pthread_attr_getstack() and then access this from another > thread. > >> > > >> > - An implicit case would be when a thread places the address of an > object from its stack onto a message queue and we have other threads > accessing it, in general, all blocking reads (sockets, files etc.) will > share stacks. > >> > > >> > This will be documented so that the user first shares the required > stacks and then performs the above operations. > >> > > >> > >> Yes. It may also be worth thinking whether we can/should "relocate" > >> stacks when they get shared and spare TLB entries are low. This would > >> be a dynamic way to consolidate regions, while a static way would rely > >> on some configuration method to declare ahead of time which stacks may > >> be shared, or to require the stack allocator (hook) to manage that > >> kind of complexity. > > > > > > Sorry but I am not sure I clearly understand what you are trying to > suggest. Does relocating stacks mean moving them to the same virtual > address as the thread-stack it is being shared with but with different ASID? > > No. We don't want to break the 1:1 pa:va mappings. That is another > design constraint, I suppose. > > If a user wants to share several sets of tasks stacks mutually with > each other, using the same permissions (e..g, RW), then it would be > efficient to pack the sharing tasks together in the same page/segment > to use 1 TLB entry for them. This is a thought for an optimizationy > down the road, maybe. Gedare > Got it. Now that we are clear on most of the aspects of low-level design(handling context switch through interrupt remains), I suppose we can decide as to how the high-level user-configuration design/implementation should look like. - My idea has been to configure the stack protection mecahnism in application based on the current scheme for configuring a system on RTEMS. - We can have a 'CONFIGURE_MPU_STACK_PROT' or a 'CONFIGURE_MMU_STACK_PROT' and a 'CONFIGURE_PROT_NUMBER_STACK' based on the CPU. The number of protected stacks for an MPU based CPU would be the common minimum of all architectures (Most of the architectures provide atleast 8 protectoin domains). - Since, the task stacks are allocated from the rtems-workspace, depending upon wether we MPU or MMU we can set the workspace size for thread stack allocation. Parallel to this I have started my imlpementation for isolating thread-stacks, as a first step I will be isolating two blocks of memory with appropriate access permissions. Then I will extend this for thread-stacks.
_______________________________________________ devel mailing list [email protected] http://lists.rtems.org/mailman/listinfo/devel
