fs/nfs/README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114


    This is an NFS client for Linux that supports async RPC calls for
    read-ahead (and hopefully soon, write-back) on regular files. 

    The implementation uses a straightforward nfsiod scheme.  After
    trying out a number of different concepts, I finally got back to
    this concept, because everything else either didn't work or gave me
    headaches. It's not flashy, but it works without hacking into any
    other regions of the kernel.


    HOW TO USE

    This stuff compiles as a loadable module (I developed it on 1.3.77).
    Simply type mkmodule, and insmod nfs.o. This will start four nfsiod's
    at the same time (which will show up under the pseudonym of insmod in
    ps-style listings).

    Alternatively, you can put it right into the kernel: remove everything
    from fs/nfs, move the Makefile and all *.c to this directory, and
    copy all *.h files to include/linux.

    After mounting, you should be able to watch (with tcpdump) several
    RPC READ calls being placed simultaneously.


    HOW IT WORKS

    When a process reads from a file on an NFS volume, the following
    happens:

     *	nfs_file_read sets file->f_reada if more than 1K is
    	read at once. It then calls generic_file_read.

     *	generic_file_read requests one ore more pages via
    	nfs_readpage.

     *	nfs_readpage allocates a request slot with an nfsiod
    	daemon, fills in the READ request, sends out the
    	RPC call, kicks the daemon, and returns.
    	If there's no free biod, nfs_readpage places the
    	call directly, waiting for the reply (sync readpage).

     *	nfsiod calls nfs_rpc_doio to collect the reply. If the
    	call was successful, it sets page->uptodate and
    	wakes up all processes waiting on page->wait;

    This is the rough outline only. There are a few things to note:

     *	Async RPC will not be tried when server->rsize < PAGE_SIZE.

     *	When an error occurs, nfsiod has no way of returning
    	the error code to the user process. Therefore, it flags
    	page->error and wakes up all processes waiting on that
    	page (they usually do so from within generic_readpage).

    	generic_readpage finds that the page is still not
    	uptodate, and calls nfs_readpage again. This time around,
    	nfs_readpage notices that page->error is set and
    	unconditionally does a synchronous RPC call.

    	This area needs a lot of improvement, since read errors
    	are not that uncommon (e.g. we have to retransmit calls
    	if the fsuid is different from the ruid in order to
    	cope with root squashing and stuff like this).

	Retransmits with fsuid/ruid change should be handled by
	nfsiod, but this doesn't come easily (a more general nfs_call
	routine that does all this may be useful...)

     *	To save some time on readaheads, we save one data copy
    	by frobbing the page into the iovec passed to the
	RPC code so that the networking layer copies the
    	data into the page directly.

    	This needs to be adjustable (different authentication
    	flavors; AUTH_NULL versus AUTH_SHORT verifiers).

     *	Currently, a fixed number of nfsiod's is spawned from
    	within init_nfs_fs. This is problematic when running
    	as a loadable module, because this will keep insmod's
    	memory allocated. As a side-effect, you will see the
    	nfsiod processes listed as several insmod's when doing
    	a `ps.'

     * 	This NFS client implements server congestion control via
	Van Jacobson slow start as implemented in 44BSD. I haven't
	checked how well this behaves, but since Rick Macklem did
	it this way, it should be okay :-)


    WISH LIST

    After giving this thing some testing, I'd like to add some more
    features:

     *	Some sort of async write handling. True write-back doesn't
	work with the current kernel (I think), because invalidate_pages
	kills all pages, regardless of whether they're dirty or not.
	Besides, this may require special bdflush treatment because
	write caching on clients is really hairy.

	Alternatively, a write-through scheme might be useful where
	the client enqueues the request, but leaves collecting the
	results to nfsiod. Again, we need a way to pass RPC errors
	back to the application.

     *	Support for different authentication flavors.

     *	/proc/net/nfsclnt (for nfsstat, etc.).

March 29, 1996
Olaf Kirch <okir@monad.swb.de>