1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
|
Each instance of software process mapping state holds:
Reference count
Access control lock
Statistic info
pmap ASID
global ASID generation comparator
array of pages of R4K PTE's
The control structure looks like the following:
typedef struct pmap {
int pm_count; /* pmap reference count */
simple_lock_data_t pm_lock; /* lock on pmap */
struct pmap_statistics pm_stats; /* pmap statistics */
int pm_tlbpid; /* address space tag */
u_int pm_tlbgen; /* TLB PID generation number */
struct segtab *pm_segtab; /* pointers to pages of PTEs */
} *pmap_t;
The PTE array is sized only to be able to map userspace, it looks
like this:
#define PMAP_SEGTABSIZE 512
typedef union pt_entry {
unsigned int pt_entry; /* for copying, etc. */
struct pte pt_pte; /* for getting to bits by name */
} pt_entry_t; /* Mach page table entry */
struct segtab {
union pt_entry *seg_tab[PMAP_SEGTABSIZE];
};
All user processes have pm_segtab point to a block of memory for this
array. The special kernel pmap has a NULL pm_segtab, this is how you
can tell if you are adding/removing mappings to the kernel or not.
At boot time INIT gets pmap ASID 1 and the current global tlbpid_gen
value for it's generation comparator. All other new processes get a
pmap ASID and comparator of 0 which will always require the allocation
of a new ASID when this process is first scheduled.
To find a PTE within the R4K PTE array given a user virtual address
and a software pmap extract like this:
--------------------------------------------
User VADDR | 0 | array elem | PTE within PAGE | 0 |
--------------------------------------------
31 30 22 21 12 11 0
For example:
pte_t *vaddr_to_pte(pmap_t pmap, unsigned long vaddr)
{
int entry;
pte_t *pte;
entry = (vaddr >> 22) & 0x1ff;
pte = pmap->pm_segtab->seg_tab[entry];
return pte + ((vaddr >> 12) & 0x3ff);
}
To destroy a process mapping.
1) Decrement pmap reference count
2) If reference count is now zero and pmap has pm_segtab
a) check each seg_tab array entry
b) if non-NULL, flush the page from the cache,
free the page, and mark seg_tab[xxx] to NULL
c) Free pm_segtab array and set pm_segtab to NULL
ASID allocation
This happens only at switch() time, pmap_alloc_tlbpid()
is called and is passed a ptr to the proc being switched
to.
The global tlbpid_gen counter is compared against the
pm_tlbgen in the pmap for this process. When the generation
changes and the pmap gen does not match, a new asid needs
to be allocated to the process.
The idea is that when the entire tlb is flushed, you go to
another generation. So you see things go like this:
Boot time: tlbpid_gen = 1
tlbpid_cnt = 2
INIT task: pm_tlbpid = 1
pm_tlbgen = tlbpid_gen
When INIT hits the cpu for the first time, it's pm_tlbgen will
match tlbpid_gen and therefore it's pm_tlbpid of 1 will be
used as the ASID when control is passed back to switch().
Let's say another task is forked off my init.
New task: pm_tlbpid = 0
pm_tlbgen = 0
When this task hits the cpu for the first time, since a tlbgen
of zero will never match tlbpid_gen, it is allocated a new ASID
which is the current value of tlbpid_cnt. tlbpid_cnt is now
incremented. When tlbpid_cnt grows larger than the number
of ASIDS supported by the MMU, the entire TLB is flushed and
this task instead gets a tlbpid of 1 and tlbpid_gen is
incremented causing all other tasks to require a new ASID
when they are switch()'d to next.
The idea is that reallocating an ASID for a task would be too
expensive if it required searching for the previous ASID in the
current set of TLB entires. It is cheaper just to flush the
entire TLB and require everyone to get a new ASID when this
overflow happens. But in between overflows, and thus while
tlbpid_gen is the same, a process retains it's ASID for each
invocation of switch() for which it is scheduled.
Adding a new entry into a processes pmap is pretty straight forward:
a) Given a pmap, a virtual address, a physical page,
a set of protections and a wired true/false value
we decide what to do.
b) If this is real physical RAM (as opposed to device
memory, IS_VM_PHYSADDR(pa) tells the case) we do
the following.
Set up the pte page permissions:
a) if the protections passed do not indicate
write protection, make it a read-only pte
which is MIPS terms is
(PG_RDONLY | PG_VALID | PG_CACHED)
The PG_RDONLY is a software bit which is masked
out at tlb refill time (discussed later).
b) if the protections do indicate the mapping
to be entered is indeed writable we setup
the pte based upon whether this is going into
the kernel map or a user map
For kernel map we just allow the page to be
written to from the get go, and clear the PG_CLEAN
flag for the page_struct this physical page is
represented by, end of story.
For a user, we only allow writes from the start
if the page_struct is already not clean, else
we don't set the MIPS pte dirty bit.
The page is marked valid and cachable no matter what.
Enter the new mapping into the pv_list (discussed later).
c) If this is a mapped device then use PG_IOPAGE permissions,
if not writable clear the dirty MIPS pte bit, clear the global
bit (which is set in the PG_IOPAGE expansion) in all cases.
d) If this is an executable page, push the page out of
the instruction cache.
MIPS is funny in that all cache operations perform an
address translation, so you have to be careful. OpenBSD
uses the KSEG0 address (which does not go through the
TLB to be translated) for these ICACHE flushes. The indexed
primary icache flush is used to remove the lines from the
cache.
e) If this is a kernel mapping (pmap->pm_segtab is NULL), get
the pte_t ptr from kernel_regmap storage, or in the physical
page number and (PG_ROPAGE | PG_G) (XXX why all the time?),
set the wired bit and increment the wired count if the wired
boolean arg was true, increment the resident count if the
pte was previously invalid, call tlbupdate to get rid of
any previous mapping, and set pte->pt_entry to this new pte
value.
f) For user we need to check first if the PTE array points to
a page yet. If not we need to get a zero page. Calculate
the offset into the appropriate page based upon virtual
address, or in the virtual page in the new pte value,
increment wired and resident count if necessary, and
set pte->pte_entry to this new pte value. Finally, if
the process has (potentially) a valid ASID (and therefore
entries in the TLB right now, ie. pmap->pm_tlbgen ==
tlbpid_gen) then remote any matching enties in the TLB
for this processes virtual-page/ASID pair.
The kernel keeps a pv_list, it has one entry for each managed physical
page in the system. Off each entry is a linked list, one for each
virtual page to which the entries physical page is mapped, the list
head counts as the first entry. This list is used to detect cache
aliasing problems with virtual caches.
When the kernel adds a new element to a physical pages pv_list entry,
it checks whether this new virtual mapping could cause a cache alias,
if so then it marks all of the virtual pages in the list uncacheable.
The reason this is done is simple:
Suppose a process has physical page X mapped at two virtual
addresses within it's address space, called Y and Z. Y and
Z could potentially be in the cache at the same time due to
the way their two addresses index entries in the virtual
cache. The process could bring both pages into the cache,
write to mapping Y, then read the same datum from mapping
Z and it would see the old data.
Also, when a mapping is removed (discussed in a bit) the kernel
rechecks the pv_list to see if the physical pages mappings were marked
uncachable and if so, it runs through the list without the mapping now
being removed to see if the alias is no longer present. If no alias
exists any more, all the virtual pages in the pv_list are mapped
cachable again.
The pv_list is also checked when the kernel changes permissions
on an extent of user virtual address space within a pmap.
Mappings are removed from a processes pmap in the following manner:
Kernel is told the pmap, beginning virtual address, and
ending virtual address in which to perform the de-map
operation. First the kernel case.
For each pte within the given range, if the pte for that
page is not marked valid we skip it. If it is valid, first
we decrement the resident count unconditionally, and decrement
the wired count if the entry was marked with the wired
attribute. Next the pv_list is consulted as discussed above,
and if the mapping was the last pv_list element for the
assosciated physical page, then the cache is flushed of the
data. Finally, the pte is marked invalid, retaining the
global bit, and the tlb is flushed of the entry if still
present within the mmu hardware.
On the user end of things, do the same as the kernel case,
except that the mmu TLB hardware is only flushed of each
entry if the pmap in question (potentially) holds a valid
ASID by way of pmap->pm_tlbgen being equal to tlbpid_gen.
Changes occur to a range of virtual addresses within a processes pmap
in two slightly different ways. In on way, the protection of a single
page is lowered from what it is currently. The second way moves the
protection of a virtual address region to an arbitrary direction,
either more strict or less strict.
In the first case, the kernel is given a physical address and a new
protection value. If the protection is full blast read/write/execute,
or just read+write, nothing is done because the existing protections
will always be equal to this new protection (this procedure is only
invoked to lower permissions). If a read only type protection is
being requested, the pv_list for this physical address is walked and
each virtual/pmap mapping is set to the requested protection.
Finally, if the read attribute is not set in the new protection, all
virtual mappings in the physical pages pv_list are removed one at a
time via the method described above. This first case, when just
changing protections and not removing them, calls the next procedure
to do the actual work on each mapping.
Next, we are given a pmap, a virtual range extent, and the new
protections to apply to that particular range. Since this can be
called externally and not just by the per-page protection lowering
method just described, we handle the null protection request by
removing the mappings completely from the pmap. For the kernel pmap
we cycle throught the virtual addresses, and change the software copy
of the valid pte's to have the new protection then update the mmu TLB
hardware. For the user, we act similarly except that the TLB hardware
update is only performed if the pm_tlbgen for the pmap matches the
global tlbpid_gen comparator.
The kernel can be asked to change the cachable attribute for an
arbitrarily mapped physical page. This is implemented just like the
page protection code just described, the pv_list is walked down and
the cachable "protection" bit(s) are modified as asked. This is
mainly used by the pv_list alias detection code to fix mappings which
will end up causing aliases, or are detected to no longer cause an
alias due to one of the virtual mappings being removed.
The physical address backed by a given pmap/virtual-address pair can
be asked for as well. There is a method which performs this as well,
retaining the non-page offset bits in the return value if a virtual
to physical translation can be found, else NULL is returned.
Finally, two methods are provided to control the copying and
zero-clearing out of pages which will be (or already are) mapped
within someone(s) per-process pmap. This can be used when it is
necessary to create a temporary mapping for the operation or do
special things to keep caches consistant for example.
|