Each instance of software process mapping state holds: Reference count Access control lock Statistic info pmap ASID global ASID generation comparator array of pages of R4K PTE's The control structure looks like the following: typedef struct pmap { int pm_count; /* pmap reference count */ simple_lock_data_t pm_lock; /* lock on pmap */ struct pmap_statistics pm_stats; /* pmap statistics */ int pm_tlbpid; /* address space tag */ u_int pm_tlbgen; /* TLB PID generation number */ struct segtab *pm_segtab; /* pointers to pages of PTEs */ } *pmap_t; The PTE array is sized only to be able to map userspace, it looks like this: #define PMAP_SEGTABSIZE 512 typedef union pt_entry { unsigned int pt_entry; /* for copying, etc. */ struct pte pt_pte; /* for getting to bits by name */ } pt_entry_t; /* Mach page table entry */ struct segtab { union pt_entry *seg_tab[PMAP_SEGTABSIZE]; }; All user processes have pm_segtab point to a block of memory for this array. The special kernel pmap has a NULL pm_segtab, this is how you can tell if you are adding/removing mappings to the kernel or not. At boot time INIT gets pmap ASID 1 and the current global tlbpid_gen value for it's generation comparator. All other new processes get a pmap ASID and comparator of 0 which will always require the allocation of a new ASID when this process is first scheduled. To find a PTE within the R4K PTE array given a user virtual address and a software pmap extract like this: -------------------------------------------- User VADDR | 0 | array elem | PTE within PAGE | 0 | -------------------------------------------- 31 30 22 21 12 11 0 For example: pte_t *vaddr_to_pte(pmap_t pmap, unsigned long vaddr) { int entry; pte_t *pte; entry = (vaddr >> 22) & 0x1ff; pte = pmap->pm_segtab->seg_tab[entry]; return pte + ((vaddr >> 12) & 0x3ff); } To destroy a process mapping. 1) Decrement pmap reference count 2) If reference count is now zero and pmap has pm_segtab a) check each seg_tab array entry b) if non-NULL, flush the page from the cache, free the page, and mark seg_tab[xxx] to NULL c) Free pm_segtab array and set pm_segtab to NULL ASID allocation This happens only at switch() time, pmap_alloc_tlbpid() is called and is passed a ptr to the proc being switched to. The global tlbpid_gen counter is compared against the pm_tlbgen in the pmap for this process. When the generation changes and the pmap gen does not match, a new asid needs to be allocated to the process. The idea is that when the entire tlb is flushed, you go to another generation. So you see things go like this: Boot time: tlbpid_gen = 1 tlbpid_cnt = 2 INIT task: pm_tlbpid = 1 pm_tlbgen = tlbpid_gen When INIT hits the cpu for the first time, it's pm_tlbgen will match tlbpid_gen and therefore it's pm_tlbpid of 1 will be used as the ASID when control is passed back to switch(). Let's say another task is forked off my init. New task: pm_tlbpid = 0 pm_tlbgen = 0 When this task hits the cpu for the first time, since a tlbgen of zero will never match tlbpid_gen, it is allocated a new ASID which is the current value of tlbpid_cnt. tlbpid_cnt is now incremented. When tlbpid_cnt grows larger than the number of ASIDS supported by the MMU, the entire TLB is flushed and this task instead gets a tlbpid of 1 and tlbpid_gen is incremented causing all other tasks to require a new ASID when they are switch()'d to next. The idea is that reallocating an ASID for a task would be too expensive if it required searching for the previous ASID in the current set of TLB entires. It is cheaper just to flush the entire TLB and require everyone to get a new ASID when this overflow happens. But in between overflows, and thus while tlbpid_gen is the same, a process retains it's ASID for each invocation of switch() for which it is scheduled. Adding a new entry into a processes pmap is pretty straight forward: a) Given a pmap, a virtual address, a physical page, a set of protections and a wired true/false value we decide what to do. b) If this is real physical RAM (as opposed to device memory, IS_VM_PHYSADDR(pa) tells the case) we do the following. Set up the pte page permissions: a) if the protections passed do not indicate write protection, make it a read-only pte which is MIPS terms is (PG_RDONLY | PG_VALID | PG_CACHED) The PG_RDONLY is a software bit which is masked out at tlb refill time (discussed later). b) if the protections do indicate the mapping to be entered is indeed writable we setup the pte based upon whether this is going into the kernel map or a user map For kernel map we just allow the page to be written to from the get go, and clear the PG_CLEAN flag for the page_struct this physical page is represented by, end of story. For a user, we only allow writes from the start if the page_struct is already not clean, else we don't set the MIPS pte dirty bit. The page is marked valid and cachable no matter what. Enter the new mapping into the pv_list (discussed later). c) If this is a mapped device then use PG_IOPAGE permissions, if not writable clear the dirty MIPS pte bit, clear the global bit (which is set in the PG_IOPAGE expansion) in all cases. d) If this is an executable page, push the page out of the instruction cache. MIPS is funny in that all cache operations perform an address translation, so you have to be careful. OpenBSD uses the KSEG0 address (which does not go through the TLB to be translated) for these ICACHE flushes. The indexed primary icache flush is used to remove the lines from the cache. e) If this is a kernel mapping (pmap->pm_segtab is NULL), get the pte_t ptr from kernel_regmap storage, or in the physical page number and (PG_ROPAGE | PG_G) (XXX why all the time?), set the wired bit and increment the wired count if the wired boolean arg was true, increment the resident count if the pte was previously invalid, call tlbupdate to get rid of any previous mapping, and set pte->pt_entry to this new pte value. f) For user we need to check first if the PTE array points to a page yet. If not we need to get a zero page. Calculate the offset into the appropriate page based upon virtual address, or in the virtual page in the new pte value, increment wired and resident count if necessary, and set pte->pte_entry to this new pte value. Finally, if the process has (potentially) a valid ASID (and therefore entries in the TLB right now, ie. pmap->pm_tlbgen == tlbpid_gen) then remote any matching enties in the TLB for this processes virtual-page/ASID pair. The kernel keeps a pv_list, it has one entry for each managed physical page in the system. Off each entry is a linked list, one for each virtual page to which the entries physical page is mapped, the list head counts as the first entry. This list is used to detect cache aliasing problems with virtual caches. When the kernel adds a new element to a physical pages pv_list entry, it checks whether this new virtual mapping could cause a cache alias, if so then it marks all of the virtual pages in the list uncacheable. The reason this is done is simple: Suppose a process has physical page X mapped at two virtual addresses within it's address space, called Y and Z. Y and Z could potentially be in the cache at the same time due to the way their two addresses index entries in the virtual cache. The process could bring both pages into the cache, write to mapping Y, then read the same datum from mapping Z and it would see the old data. Also, when a mapping is removed (discussed in a bit) the kernel rechecks the pv_list to see if the physical pages mappings were marked uncachable and if so, it runs through the list without the mapping now being removed to see if the alias is no longer present. If no alias exists any more, all the virtual pages in the pv_list are mapped cachable again. The pv_list is also checked when the kernel changes permissions on an extent of user virtual address space within a pmap. Mappings are removed from a processes pmap in the following manner: Kernel is told the pmap, beginning virtual address, and ending virtual address in which to perform the de-map operation. First the kernel case. For each pte within the given range, if the pte for that page is not marked valid we skip it. If it is valid, first we decrement the resident count unconditionally, and decrement the wired count if the entry was marked with the wired attribute. Next the pv_list is consulted as discussed above, and if the mapping was the last pv_list element for the assosciated physical page, then the cache is flushed of the data. Finally, the pte is marked invalid, retaining the global bit, and the tlb is flushed of the entry if still present within the mmu hardware. On the user end of things, do the same as the kernel case, except that the mmu TLB hardware is only flushed of each entry if the pmap in question (potentially) holds a valid ASID by way of pmap->pm_tlbgen being equal to tlbpid_gen. Changes occur to a range of virtual addresses within a processes pmap in two slightly different ways. In on way, the protection of a single page is lowered from what it is currently. The second way moves the protection of a virtual address region to an arbitrary direction, either more strict or less strict. In the first case, the kernel is given a physical address and a new protection value. If the protection is full blast read/write/execute, or just read+write, nothing is done because the existing protections will always be equal to this new protection (this procedure is only invoked to lower permissions). If a read only type protection is being requested, the pv_list for this physical address is walked and each virtual/pmap mapping is set to the requested protection. Finally, if the read attribute is not set in the new protection, all virtual mappings in the physical pages pv_list are removed one at a time via the method described above. This first case, when just changing protections and not removing them, calls the next procedure to do the actual work on each mapping. Next, we are given a pmap, a virtual range extent, and the new protections to apply to that particular range. Since this can be called externally and not just by the per-page protection lowering method just described, we handle the null protection request by removing the mappings completely from the pmap. For the kernel pmap we cycle throught the virtual addresses, and change the software copy of the valid pte's to have the new protection then update the mmu TLB hardware. For the user, we act similarly except that the TLB hardware update is only performed if the pm_tlbgen for the pmap matches the global tlbpid_gen comparator. The kernel can be asked to change the cachable attribute for an arbitrarily mapped physical page. This is implemented just like the page protection code just described, the pv_list is walked down and the cachable "protection" bit(s) are modified as asked. This is mainly used by the pv_list alias detection code to fix mappings which will end up causing aliases, or are detected to no longer cause an alias due to one of the virtual mappings being removed. The physical address backed by a given pmap/virtual-address pair can be asked for as well. There is a method which performs this as well, retaining the non-page offset bits in the return value if a virtual to physical translation can be found, else NULL is returned. Finally, two methods are provided to control the copying and zero-clearing out of pages which will be (or already are) mapped within someone(s) per-process pmap. This can be used when it is necessary to create a temporary mapping for the operation or do special things to keep caches consistant for example.