arch/mips/doc/mmimpl/openbsd


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300

Each instance of software process mapping state holds:

	Reference count

	Access control lock

	Statistic info

	pmap ASID

	global ASID generation comparator

	array of pages of R4K PTE's

The control structure looks like the following:

typedef struct pmap {
	int			pm_count;	/* pmap reference count */
	simple_lock_data_t	pm_lock;	/* lock on pmap */
	struct pmap_statistics	pm_stats;	/* pmap statistics */
	int			pm_tlbpid;	/* address space tag */
	u_int			pm_tlbgen;	/* TLB PID generation number */
	struct segtab		*pm_segtab;	/* pointers to pages of PTEs */
} *pmap_t;

The PTE array is sized only to be able to map userspace, it looks
like this:

#define PMAP_SEGTABSIZE		512

typedef union pt_entry {
	unsigned int	pt_entry;	/* for copying, etc. */
	struct pte	pt_pte;		/* for getting to bits by name */
} pt_entry_t;	/* Mach page table entry */

struct segtab {
	union pt_entry	*seg_tab[PMAP_SEGTABSIZE];
};

All user processes have pm_segtab point to a block of memory for this
array.  The special kernel pmap has a NULL pm_segtab, this is how you
can tell if you are adding/removing mappings to the kernel or not.

At boot time INIT gets pmap ASID 1 and the current global tlbpid_gen
value for it's generation comparator.  All other new processes get a
pmap ASID and comparator of 0 which will always require the allocation
of a new ASID when this process is first scheduled.

To find a PTE within the R4K PTE array given a user virtual address
and a software pmap extract like this:

		--------------------------------------------
User VADDR	| 0 |  array elem |  PTE within PAGE |  0  |
		--------------------------------------------
		  31 30         22 21              12 11  0
For example:

pte_t *vaddr_to_pte(pmap_t pmap, unsigned long vaddr)
{
	int entry;
	pte_t *pte;

	entry = (vaddr >> 22) & 0x1ff;
	pte = pmap->pm_segtab->seg_tab[entry];
	return pte + ((vaddr >> 12) & 0x3ff);
}

To destroy a process mapping.

1) Decrement pmap reference count

2) If reference count is now zero and pmap has pm_segtab

	a) check each seg_tab array entry

	b) if non-NULL, flush the page from the cache,
	   free the page, and mark seg_tab[xxx] to NULL

	c) Free pm_segtab array and set pm_segtab to NULL

ASID allocation

	This happens only at switch() time, pmap_alloc_tlbpid()
	is called and is passed a ptr to the proc being switched
	to.

	The global tlbpid_gen counter is compared against the
	pm_tlbgen in the pmap for this process.  When the generation
	changes and the pmap gen does not match, a new asid needs
	to be allocated to the process.

	The idea is that when the entire tlb is flushed, you go to
	another generation.  So you see things go like this:

Boot time:	tlbpid_gen = 1
		tlbpid_cnt = 2

INIT task:	pm_tlbpid = 1
		pm_tlbgen = tlbpid_gen

	When INIT hits the cpu for the first time, it's pm_tlbgen will
	match tlbpid_gen and therefore it's pm_tlbpid of 1 will be
	used as the ASID when control is passed back to switch().
	Let's say another task is forked off my init.

New task:	pm_tlbpid = 0
		pm_tlbgen = 0

	When this task hits the cpu for the first time, since a tlbgen
	of zero will never match tlbpid_gen, it is allocated a new ASID
	which is the current value of tlbpid_cnt.  tlbpid_cnt is now
	incremented.  When tlbpid_cnt grows larger than the number
	of ASIDS supported by the MMU, the entire TLB is flushed and
	this task instead gets a tlbpid of 1 and tlbpid_gen is
	incremented causing all other tasks to require a new ASID
	when they are switch()'d to next.

	The idea is that reallocating an ASID for a task would be too
	expensive if it required searching for the previous ASID in the
	current set of TLB entires.  It is cheaper just to flush the
	entire TLB and require everyone to get a new ASID when this
	overflow happens.  But in between overflows, and thus while
	tlbpid_gen is the same, a process retains it's ASID for each
	invocation of switch() for which it is scheduled.

Adding a new entry into a processes pmap is pretty straight forward:

	a) Given a pmap, a virtual address, a physical page,
	   a set of protections and a wired true/false value
	   we decide what to do.

	b) If this is real physical RAM (as opposed to device
	   memory, IS_VM_PHYSADDR(pa) tells the case) we do
	   the following.

	   Set up the pte page permissions:

		a) if the protections passed do not indicate
		   write protection, make it a read-only pte
		   which is MIPS terms is

		(PG_RDONLY | PG_VALID | PG_CACHED)

		   The PG_RDONLY is a software bit which is masked
		   out at tlb refill time (discussed later).

		b) if the protections do indicate the mapping
		   to be entered is indeed writable we setup
		   the pte based upon whether this is going into
		   the kernel map or a user map

		   For kernel map we just allow the page to be
		   written to from the get go, and clear the PG_CLEAN
		   flag for the page_struct this physical page is
		   represented by, end of story.

		   For a user, we only allow writes from the start
		   if the page_struct is already not clean, else
		   we don't set the MIPS pte dirty bit.

		   The page is marked valid and cachable no matter what.

	   Enter the new mapping into the pv_list (discussed later).

	c) If this is a mapped device then use PG_IOPAGE permissions,
	   if not writable clear the dirty MIPS pte bit, clear the global
	   bit (which is set in the PG_IOPAGE expansion) in all cases.

	d) If this is an executable page, push the page out of
	   the instruction cache.

	   MIPS is funny in that all cache operations perform an
	   address translation, so you have to be careful.  OpenBSD
	   uses the KSEG0 address (which does not go through the
	   TLB to be translated) for these ICACHE flushes.  The indexed
	   primary icache flush is used to remove the lines from the
	   cache.

	e) If this is a kernel mapping (pmap->pm_segtab is NULL), get
	   the pte_t ptr from kernel_regmap storage, or in the physical
	   page number and (PG_ROPAGE | PG_G)  (XXX why all the time?),
	   set the wired bit and increment the wired count if the wired
	   boolean arg was true, increment the resident count if the
	   pte was previously invalid, call tlbupdate to get rid of
	   any previous mapping, and set pte->pt_entry to this new pte
	   value.

	f) For user we need to check first if the PTE array points to
	   a page yet.  If not we need to get a zero page.  Calculate
	   the offset into the appropriate page based upon virtual
	   address, or in the virtual page in the new pte value,
	   increment wired and resident count if necessary, and
	   set pte->pte_entry to this new pte value.  Finally, if
	   the process has (potentially) a valid ASID (and therefore
	   entries in the TLB right now, ie. pmap->pm_tlbgen ==
	   tlbpid_gen) then remote any matching enties in the TLB
	   for this processes virtual-page/ASID pair.

The kernel keeps a pv_list, it has one entry for each managed physical
page in the system.  Off each entry is a linked list, one for each
virtual page to which the entries physical page is mapped, the list
head counts as the first entry.  This list is used to detect cache
aliasing problems with virtual caches.

When the kernel adds a new element to a physical pages pv_list entry,
it checks whether this new virtual mapping could cause a cache alias,
if so then it marks all of the virtual pages in the list uncacheable.
The reason this is done is simple:

	Suppose a process has physical page X mapped at two virtual
	addresses within it's address space, called Y and Z.  Y and
	Z could potentially be in the cache at the same time due to
	the way their two addresses index entries in the virtual
	cache.  The process could bring both pages into the cache,
	write to mapping Y, then read the same datum from mapping
	Z and it would see the old data.

Also, when a mapping is removed (discussed in a bit) the kernel
rechecks the pv_list to see if the physical pages mappings were marked
uncachable and if so, it runs through the list without the mapping now
being removed to see if the alias is no longer present.  If no alias
exists any more, all the virtual pages in the pv_list are mapped
cachable again.

The pv_list is also checked when the kernel changes permissions
on an extent of user virtual address space within a pmap.

Mappings are removed from a processes pmap in the following manner:

	Kernel is told the pmap, beginning virtual address, and
	ending virtual address in which to perform the de-map
	operation.  First the kernel case.

	For each pte within the given range, if the pte for that
	page is not marked valid we skip it.  If it is valid, first
	we decrement the resident count unconditionally, and decrement
	the wired count if the entry was marked with the wired
	attribute.  Next the pv_list is consulted as discussed above,
	and if the mapping was the last pv_list element for the
	assosciated physical page, then the cache is flushed of the
	data.  Finally, the pte is marked invalid, retaining the
	global bit, and the tlb is flushed of the entry if still
	present within the mmu hardware.

	On the user end of things, do the same as the kernel case,
	except that the mmu TLB hardware is only flushed of each
	entry if the pmap in question (potentially) holds a valid
	ASID by way of pmap->pm_tlbgen being equal to tlbpid_gen.

Changes occur to a range of virtual addresses within a processes pmap
in two slightly different ways.  In on way, the protection of a single
page is lowered from what it is currently.  The second way moves the
protection of a virtual address region to an arbitrary direction,
either more strict or less strict.

In the first case, the kernel is given a physical address and a new
protection value.  If the protection is full blast read/write/execute,
or just read+write, nothing is done because the existing protections
will always be equal to this new protection (this procedure is only
invoked to lower permissions).  If a read only type protection is
being requested, the pv_list for this physical address is walked and
each virtual/pmap mapping is set to the requested protection.
Finally, if the read attribute is not set in the new protection, all
virtual mappings in the physical pages pv_list are removed one at a
time via the method described above.  This first case, when just
changing protections and not removing them, calls the next procedure
to do the actual work on each mapping.

Next, we are given a pmap, a virtual range extent, and the new
protections to apply to that particular range.  Since this can be
called externally and not just by the per-page protection lowering
method just described, we handle the null protection request by
removing the mappings completely from the pmap.  For the kernel pmap
we cycle throught the virtual addresses, and change the software copy
of the valid pte's to have the new protection then update the mmu TLB
hardware.  For the user, we act similarly except that the TLB hardware
update is only performed if the pm_tlbgen for the pmap matches the
global tlbpid_gen comparator.

The kernel can be asked to change the cachable attribute for an
arbitrarily mapped physical page.  This is implemented just like the
page protection code just described, the pv_list is walked down and
the cachable "protection" bit(s) are modified as asked.  This is
mainly used by the pv_list alias detection code to fix mappings which
will end up causing aliases, or are detected to no longer cause an
alias due to one of the virtual mappings being removed.

The physical address backed by a given pmap/virtual-address pair can
be asked for as well.  There is a method which performs this as well,
retaining the non-page offset bits in the return value if a virtual
to physical translation can be found, else NULL is returned.

Finally, two methods are provided to control the copying and
zero-clearing out of pages which will be (or already are) mapped
within someone(s) per-process pmap.  This can be used when it is
necessary to create a temporary mapping for the operation or do
special things to keep caches consistant for example.