summaryrefslogtreecommitdiffstats
path: root/tc
Commit message (Collapse)AuthorAgeFilesLines
* tc: code cleanupStephen Hemminger2016-03-2170-864/+966
| | | | Use checkpatch to fix whitespace and other style issues.
* tc: q_{codel,fq_codel}: add missing space in help textLuca Lemmo2016-03-212-2/+2
| | | | Signed-off-by: Luca Lemmo <luca@linux.com>
* tc: f_u32: trivial coding style cleanupsLuca Lemmo2016-03-211-4/+4
| | | | Signed-off-by: Luca Lemmo <luca@linux.com>
* tc: f_u32: add missing spaces around operatorsLuca Lemmo2016-03-211-6/+6
| | | | Signed-off-by: Luca Lemmo <luca@linux.com>
* tc: pedit: Fix retain value for ihl adjustmentsPhil Sutter2016-03-061-1/+1
| | | | | | | | | | | Since the IP Header Length field is just half a byte, adjust retain to only match these bits so the Version field is not overwritten by accident. The whole concept is actually broken due to dependency on endianness which pedit ignores. Signed-off-by: Phil Sutter <phil@nwl.cc>
* tc: pedit: Fix parse_cmd()Phil Sutter2016-03-061-16/+7
| | | | | | | | | | | | | | | | | This was horribly broken: * pack_key8() and pack_key16() ... * missed to invert retain value when applying it to the mask, * did not sanitize val by ANDing it with retain, * and ignored the mask which is necessary for 'invert' command. * pack_key16() did not convert mask to network byte order. * Changing the retain value for 'invert' or 'retain' operation seems just plain wrong. * While here, also got rid of unnecessary offset sanitization in pack_key32(). * Simplify code a bit by always assigning the local mask variable to tkey->mask before calling any of the pack_key*() variants. Signed-off-by: Phil Sutter <phil@nwl.cc>
* tc: pedit: Fix layered op parsingPhil Sutter2016-03-061-0/+1
| | | | | | | | | After lookup of the layered op submodule, pedit would pass argv and argc including the layered op identifier at first position which confused the submodule parser. Fix this by calling NEXT_ARG() before calling the parse_peopt() callback. Signed-off-by: Phil Sutter <phil@nwl.cc>
* tc: pedit: document branch control in help outputPhil Sutter2016-03-041-1/+2
| | | | | | | This seems to have been a hidden feature, though it's very useful and necessary at least when combining multiple pedit actions. Signed-off-by: Phil Sutter <phil@nwl.cc>
* htb: rename b4 buffer to b3 to make its name more consistentDmitrii Shcherbakov2016-02-171-2/+2
| | | | | | | | | b3 buffer has been deleted previously so b2 is followed by b4 which is not consistent. Signed-off-by: Dmitrii Shcherbakov <fw.dmitrii@yandex.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc>
* htb: remove printing of a deprecated overhead valueDmitrii Shcherbakov2016-02-171-7/+4
| | | | | | | | | | | Remove printing according to the previously used encoding of mpu and overhead values within the tc_ratespec's mpu field. This encoding is no longer being used as a separate 'overhead' field in the ratespec structure has been introduced. Signed-off-by: Dmitrii Shcherbakov <fw.dmitrii@yandex.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc>
* tc, bpf: use bind/type macros from gelfDaniel Borkmann2016-02-071-5/+2
| | | | | | | Don't reimplement them and rather use the macros from the gelf header, that is, GELF_ST_BIND()/GELF_ST_TYPE(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* tc, bpf: give some more hints wrt false relosDaniel Borkmann2016-02-071-1/+9
| | | | | | | | | | | | | Provide some more hints to the user/developer when relos have been found that don't point to ld64 imm instruction. Ran couple of times into relos generated by clang [1], where the compiler tried to uninline inlined functions with eBPF and emitted BPF_JMP | BPF_CALL opcodes. If this seems the case, give a hint that the user should do a work-around to use always_inline annotation. [1] https://llvm.org/bugs/show_bug.cgi?id=26243#c3 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* tc, bpf: improve verifier loggingDaniel Borkmann2016-02-072-47/+110
| | | | | | | | | | | | | | | With a bit larger, branchy eBPF programs f.e. already ~BPF_MAXINSNS/7 in size, it happens rather quickly that bpf(2) rejects also valid programs when only the verifier log buffer size we have in tc is too small. Change that, so by default we don't do any logging, and only in error case we retry with logging enabled. If we should fail providing a reasonable dump of the verifier analysis, retry few times with a larger log buffer so that we can at least give the user a chance to debug the program. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com>
* tc: fix compilation with old gcc (< 4.6) (bis)Nicolas Dichtel2016-02-051-25/+33
| | | | | | | | | | Commit 8f80d450c3cb ("tc: fix compilation with old gcc (< 4.6)") was reverted to ease the merge of the net-next branch. Here is the new version. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* tc, bpf: make sure relo is in relation with map sectionDaniel Borkmann2016-02-021-0/+6
| | | | | | | | | | | Add a test that symbol from relocation entry is actually related to map section and bail out with an error message if it's not the case; in relation to [1]. [1] https://llvm.org/bugs/show_bug.cgi?id=26243 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
* Merge branch 'master' of ↵Stephen Hemminger2016-02-0229-741/+1657
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2
| * tc, bpf: more header checks on loading elfDaniel Borkmann2016-01-181-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | eBPF llvm backend can support different BPF formats, make sure the object we're trying to load matches with regards to endiannes and while at it, also check for other attributes related to BPF ELFs. # llc --version LLVM (http://llvm.org/): LLVM version 3.8.0svn Optimized build. Built Jan 9 2016 (02:08:10). Default target: x86_64-unknown-linux-gnu Host CPU: ivybridge Registered Targets: bpf - BPF (host endian) bpfeb - BPF (big endian) bpfel - BPF (little endian) [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
| * tc, bpf: check section names and type everywhereDaniel Borkmann2016-01-181-6/+15
| | | | | | | | | | | | | | | | | | | | When extracting sections, we better check for name and type. Noticed that some llvm versions emit .strtab and .shstrtab (e.g. saw it on pre 3.7), while more recent ones only seem to emit .strtab. Thus, make sure we get the right sections. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
| * tc, clsact: add clsact frontendDaniel Borkmann2016-01-184-14/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the tc part for the kernel commit 1f211a1b929c ("net, sched: add clsact qdisc"). Quoting example usage from that commit description: Example, adding qdisc: # tc qdisc add dev foo clsact # tc qdisc show dev foo qdisc mq 0: root qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc clsact ffff: parent ffff:fff1 Adding filters (deleting, etc works analogous by specifying ingress/egress): # tc filter add dev foo ingress bpf da obj bar.o sec ingress # tc filter add dev foo egress bpf da obj bar.o sec egress # tc filter show dev foo ingress filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action # tc filter show dev foo egress filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action The ingress parent alias can also be used with ingress qdisc. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * tc, ingress: clean up ingress handling a bitDaniel Borkmann2016-01-182-23/+8
| | | | | | | | | | | | | | Clean it up a bit, we can also get rid of some ugly ifdefs as in our case TC_H_INGRESS is always defined. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * Merge branch 'net-next'Stephen Hemminger2016-01-186-605/+1433
| |\
| | * monitor: fix file handle leakStephen Hemminger2015-12-301-3/+7
| | | | | | | | | | | | | | | | | | In some cases passing file to monitor left file open. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
| | * bpf: minor fix in api and bpf_dump_error() usageDaniel Borkmann2015-12-171-1/+1
| | | | | | | | | | | | | | | | | | | | | Fix a whitespace in bpf_dump_error() usage, and also a missing closing bracket in ntohl() macro for eBPF programs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * {f,m}_bpf: allow updates on program arraysDaniel Borkmann2015-11-293-149/+306
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we have all infrastructure in place now, allow atomic live updates on program arrays. This can be very useful e.g. in case programs that are being tail-called need to be replaced, f.e. when classifier functionality needs to be changed, new protocols added/removed during runtime, etc. Thus, provide a way for in-place code updates, minimal example: Given is an object file cls.o that contains the entry point in section 'classifier', has a globally pinned program array 'jmp' with 2 slots and id of 0, and two tail called programs under section '0/0' (prog array key 0) and '0/1' (prog array key 1), the section encoding for the loader is <id/key>. Adding the filter loads everything into cls_bpf: tc filter add dev foo parent ffff: bpf da obj cls.o Now, the program under section '0/1' needs to be replaced with an updated version that resides in the same section (also full path to tc's subfolder of the mount point can be passed, e.g. /sys/fs/bpf/tc/globals/jmp): tc exec bpf graft m:globals/jmp obj cls.o sec 0/1 In case the program resides under a different section 'foo', it can also be injected into the program array like: tc exec bpf graft m:globals/jmp key 1 obj cls.o sec foo If the new tail called classifier program is already available as a pinned object somewhere (here: /sys/fs/bpf/tc/progs/parser), it can be injected into the prog array like: tc exec bpf graft m:globals/jmp key 1 fd m:progs/parser In the kernel, the program on key 1 is being atomically replaced and the old one's refcount dropped. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
| | * {f, m}_bpf: allow for user-defined object pinningsDaniel Borkmann2015-11-291-20/+192
| | | | | | | | | | | | | | | | | | | | | | | | | | | The recently introduced object pinning can be further extended in order to allow sharing maps beyond tc namespace. F.e. maps that are being pinned from tracing side, can be accessed through this facility as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
| | * {f, m}_bpf: check map attributes when fetching as pinnedDaniel Borkmann2015-11-291-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make use of the new show_fdinfo() facility and verify that when a pinned map is being fetched that its basic attributes are the same as the map we declared from the ELF file. I.e. when placed into the globalns, collisions could occur. In such a case warn the user and bail out. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
| | * {f,m}_bpf: make tail calls workingDaniel Borkmann2015-11-291-8/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have the possibility of sharing maps, it's time we get the ELF loader fully working with regards to tail calls. Since program array maps are pinned, we can keep them finally alive. I've noticed two bugs that are being fixed in bpf_fill_prog_arrays() with this patch. Example code comes as follow-up. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
| | * {f,m}_bpf: allow for sharing mapsDaniel Borkmann2015-11-235-604/+1035
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This larger work addresses one of the bigger remaining issues on tc's eBPF frontend, that is, to allow for persistent file descriptors. Whenever tc parses the ELF object, extracts and loads maps into the kernel, these file descriptors will be out of reach after the tc instance exits. Meaning, for simple (unnested) programs which contain one or multiple maps, the kernel holds a reference, and they will live on inside the kernel until the program holding them is unloaded, but they will be out of reach for user space, even worse with (also multiple nested) tail calls. For this issue, we introduced the concept of an agent that can receive the set of file descriptors from the tc instance creating them, in order to be able to further inspect/update map data for a specific use case. However, while that is more tied towards specific applications, it still doesn't easily allow for sharing maps accross multiple tc instances and would require a daemon to be running in the background. F.e. when a map should be shared by two eBPF programs, one attached to ingress, one to egress, this currently doesn't work with the tc frontend. This work solves exactly that, i.e. if requested, maps can now be _arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within a single object (but various program sections, PIN_OBJECT_NS) without "loosing" the file descriptor set. To make that happen, we use eBPF object pinning introduced in kernel commit b2197755b263 ("bpf: add support for persistent maps/progs") for exactly this purpose. The shipped examples/bpf/bpf_shared.c code from this patch can be easily applied, for instance, as: - classifier-classifier shared: tc filter add dev foo parent 1: bpf obj shared.o sec egress tc filter add dev foo parent ffff: bpf obj shared.o sec ingress - classifier-action shared (here: late binding to a dummy classifier): tc actions add action bpf obj shared.o sec egress pass index 42 tc filter add dev foo parent ffff: bpf obj shared.o sec ingress tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \ action bpf index 42 The toy example increments a shared counter on egress and dumps its value on ingress (if no sharing (PIN_NONE) would have been chosen, map value is 0, of course, due to the two map instances being created): [...] <idle>-0 [002] ..s. 38264.788234: : map val: 4 <idle>-0 [002] ..s. 38264.788919: : map val: 4 <idle>-0 [002] ..s. 38264.789599: : map val: 5 [...] ... thus if both sections reference the pinned map(s) in question, tc will take care of fetching the appropriate file descriptor. The patch has been tested extensively on both, classifier and action sides. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | Revert "tc: fix compilation with old gcc (< 4.6)"Stephen Hemminger2016-01-181-27/+21
| | | | | | | | | | | | This reverts commit 8f80d450c3cb0996d839996807b77ca28bd4da09.
| * | tc: flower no need to specify the ethertypeJamal Hadi Salim2016-01-111-38/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | since all tc classifiers are required to specify ethertype as part of grammar By not allowing eth_type to be specified we remove contradiction for example when a user specifies: tc filter add ... priority xxx protocol ip flower eth_type ipv6 This patch removes that contradiction Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
| * | tc: fix compilation with old gcc (< 4.6)Julien Floret2016-01-111-21/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc < 4.6 does not handle C11 syntax for the static initialization of anonymous struct/union, hence the following error: tc_bpf.c:260: error: unknown field map_type specified in initializer Signed-off-by: Julien Floret <julien.floret@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
| * | tc: m_connmark: Fix help textPhil Sutter2016-01-071-1/+1
| |/ | | | | | | | | | | | | When specifying a conntrack zone, the 'zone' keyword has to be used before the actual zone index. Signed-off-by: Phil Sutter <phil@nwl.cc>
| * Merge branch 'master' into net-nextStephen Hemminger2015-10-2318-49/+31
| |\
| | * tc: remove extra whitespaceStephen Hemminger2015-10-2314-16/+0
| | | | | | | | | | | | No blank lines at EOF, or trailing whitespace.
| | * tc: u32 filter coding style cleanupPhil Sutter2015-10-231-29/+27
| | | | | | | | | | | | | | | | | | | | | | | | Add missing spaces around operators to increase readability. Aside from that, make "preference" match a real synonym for "tos" and "dsfield" as it's effect was identical to them. Signed-off-by: Phil Sutter <phil@nwl.cc>
| | * tc: improve filter help texts a bitPhil Sutter2015-10-233-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | This fixes a few syntax errors and changes route filter help text to use classid instead of flowid to be consistent with other filters' help texts. Signed-off-by: Phil Sutter <phil@nwl.cc>
| * | m_bpf: don't require default opcode on ebpf actionsDaniel Borkmann2015-10-121-24/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the patch, the most minimal command to load an eBPF action for late binding with auto index selection through tc is: tc actions add action bpf obj prog.o We already set TC_ACT_PIPE in tc as default opcode, so if nothing further has been specified, just use it. Also, allow "ok" next to "pass" for matching cmdline on TC_ACT_OK. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | f_bpf: allow for optional classid and add flagsDaniel Borkmann2015-10-121-13/+30
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When having optional classid, most minimal command can be sth like: tc filter add dev foo parent X: bpf obj prog.o Therefore, adapt the code so that a next argument will not be enforced as the case currently. Also, minor cleanup on the classid, where we should rather have used addattr32(), and add flags for exec configuration, for example (using short notation): tc filter add dev foo parent X: bpf da obj prog.o Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com>
* / qfq: fix parse_opt dead codeStephen Hemminger2015-10-271-9/+4
|/ | | | Fix Coverity warning from dead code.
* fq: fix whitespaceStephen Hemminger2015-09-251-2/+2
|
* tc: fq: allow setting and retrieving orphan_maskEric Dumazet2015-09-251-1/+20
| | | | | | | | | | | | | | | linux-3.19 fq packet scheduler got a new attribute, controlling number of 'flows' holding packets not attached to a socket (forwarding usage) kernel commit is 06eb395fa9856b5a87cf7d80baee2a0ed3cdb9d7 ("pkt_sched: fq: better control of DDOS traffic") This patch adds corresponding code to tc command. tc qd replace dev eth0 root fq orphan_mask 511 Signed-off-by: Eric Dumazet <edumazet@google.com>
* tc : add timestamps to tc monitorEric Dumazet2015-09-252-1/+10
| | | | | | | | | | | Support -timestamp and -tshort options for tc monitor like ip monitor. # tc -tshort monitor [2015-09-23T16:39:11.260555] qdisc fq 8003: dev eth0 root refcnt 2 limit 10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140 refill_delay 40.0ms Signed-off-by: Eric Dumazet <edumazet@google.com>
* tc: fq: allow setting and retrieving flow refill delayPhil Sutter2015-09-231-1/+19
| | | | | | | | Code to parse and export this tuneable via netlink is already present in sched_fq.c of the kernel, so not making it accessible for users would be a waste of resources. Signed-off-by: Phil Sutter <phil@nwl.cc>
* comment: Fix remaining listings of wrong FSF addressPhil Sutter2015-09-231-2/+1
| | | | | | | This patch follows the changes of commit 4d98ab0 ("Fix FSF address in file headers"), fixing file headers added after it. Signed-off-by: Phil Sutter <phil@nwl.cc>
* Merge branch 'master' into net-nextStephen Hemminger2015-08-131-15/+5
|\
| * tc: fix return after invargStephen Hemminger2015-08-131-15/+5
| |
* | m_bpf: add frontend support for late bindingDaniel Borkmann2015-08-101-9/+11
|/ | | | | | | Frontend support for kernel commit a5c90b29e5cc ("act_bpf: properly support late binding of bpf action to a classifier"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* tc: fix bpf compilation with old glibcNicolas Dichtel2015-07-272-2/+2
| | | | | | | | | | | | | | | | | | | Error was: f_bpf.o: In function `bpf_parse_opt': f_bpf.c:(.text+0x88f): undefined reference to `secure_getenv' m_bpf.o: In function `parse_bpf': m_bpf.c:(.text+0x587): undefined reference to `secure_getenv' collect2: error: ld returned 1 exit status There is no special reason to use the secure version of getenv, thus let's simply use getenv(). CC: Daniel Borkmann <daniel@iogearbox.net> Fixes: 88eea5395483 ("tc: {f,m}_bpf: allow to retrieve uds path from env") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
* Merge branch 'master' into net-nextStephen Hemminger2015-06-261-24/+8
|\
| * iproute2: tc/m_pedit.c - remove dead codeMaciej Żenczykowski2015-06-251-24/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The initializers are simply not needed. These if-blocks are outright dead code, because '0 > unsigned' is always false, so only else clause triggers and regardless of which clause triggers it only updates 'ind' which is later unconditionally written to before being used anyway. Otherwise we get errors from clang: m_pedit.c:166:8: error: comparison of 0 > unsigned expression is always false [-Werror,-Wtautological-compare] if (0 > tkey->off) { ~ ^ ~~~~~~~~~ m_pedit.c:209:8: error: comparison of 0 > unsigned expression is always false [-Werror,-Wtautological-compare] if (0 > tkey->off) { ~ ^ ~~~~~~~~~ 2 errors generated. Change-Id: I3c9e9092915088fc56f992e5df736851541a4458