| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Use checkpatch to fix whitespace and other style issues.
|
|
|
|
| |
Signed-off-by: Luca Lemmo <luca@linux.com>
|
|
|
|
| |
Signed-off-by: Luca Lemmo <luca@linux.com>
|
|
|
|
| |
Signed-off-by: Luca Lemmo <luca@linux.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Since the IP Header Length field is just half a byte, adjust retain to
only match these bits so the Version field is not overwritten by
accident.
The whole concept is actually broken due to dependency on endianness
which pedit ignores.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was horribly broken:
* pack_key8() and pack_key16() ...
* missed to invert retain value when applying it to the mask,
* did not sanitize val by ANDing it with retain,
* and ignored the mask which is necessary for 'invert' command.
* pack_key16() did not convert mask to network byte order.
* Changing the retain value for 'invert' or 'retain' operation seems
just plain wrong.
* While here, also got rid of unnecessary offset sanitization in
pack_key32().
* Simplify code a bit by always assigning the local mask variable to
tkey->mask before calling any of the pack_key*() variants.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
| |
After lookup of the layered op submodule, pedit would pass argv and argc
including the layered op identifier at first position which confused the
submodule parser. Fix this by calling NEXT_ARG() before calling the
parse_peopt() callback.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
| |
This seems to have been a hidden feature, though it's very useful and
necessary at least when combining multiple pedit actions.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
| |
b3 buffer has been deleted previously so b2 is followed by b4
which is not consistent.
Signed-off-by: Dmitrii Shcherbakov <fw.dmitrii@yandex.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
| |
Remove printing according to the previously used encoding of mpu and
overhead values within the tc_ratespec's mpu field. This encoding is
no longer being used as a separate 'overhead' field in the ratespec
structure has been introduced.
Signed-off-by: Dmitrii Shcherbakov <fw.dmitrii@yandex.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
| |
Don't reimplement them and rather use the macros from the gelf header,
that is, GELF_ST_BIND()/GELF_ST_TYPE().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide some more hints to the user/developer when relos have been found
that don't point to ld64 imm instruction. Ran couple of times into relos
generated by clang [1], where the compiler tried to uninline inlined
functions with eBPF and emitted BPF_JMP | BPF_CALL opcodes. If this seems
the case, give a hint that the user should do a work-around to use
always_inline annotation.
[1] https://llvm.org/bugs/show_bug.cgi?id=26243#c3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With a bit larger, branchy eBPF programs f.e. already ~BPF_MAXINSNS/7 in
size, it happens rather quickly that bpf(2) rejects also valid programs
when only the verifier log buffer size we have in tc is too small.
Change that, so by default we don't do any logging, and only in error
case we retry with logging enabled. If we should fail providing a
reasonable dump of the verifier analysis, retry few times with a larger
log buffer so that we can at least give the user a chance to debug the
program.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
Commit 8f80d450c3cb ("tc: fix compilation with old gcc (< 4.6)") was reverted
to ease the merge of the net-next branch.
Here is the new version.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
|
|
|
|
|
|
|
|
|
| |
Add a test that symbol from relocation entry is actually related
to map section and bail out with an error message if it's not the
case; in relation to [1].
[1] https://llvm.org/bugs/show_bug.cgi?id=26243
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|\
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
eBPF llvm backend can support different BPF formats, make sure the object
we're trying to load matches with regards to endiannes and while at it, also
check for other attributes related to BPF ELFs.
# llc --version
LLVM (http://llvm.org/):
LLVM version 3.8.0svn
Optimized build.
Built Jan 9 2016 (02:08:10).
Default target: x86_64-unknown-linux-gnu
Host CPU: ivybridge
Registered Targets:
bpf - BPF (host endian)
bpfeb - BPF (big endian)
bpfel - BPF (little endian)
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When extracting sections, we better check for name and type. Noticed
that some llvm versions emit .strtab and .shstrtab (e.g. saw it on pre
3.7), while more recent ones only seem to emit .strtab. Thus, make sure
we get the right sections.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add the tc part for the kernel commit 1f211a1b929c ("net, sched: add
clsact qdisc"). Quoting example usage from that commit description:
Example, adding qdisc:
# tc qdisc add dev foo clsact
# tc qdisc show dev foo
qdisc mq 0: root
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc clsact ffff: parent ffff:fff1
Adding filters (deleting, etc works analogous by specifying ingress/egress):
# tc filter add dev foo ingress bpf da obj bar.o sec ingress
# tc filter add dev foo egress bpf da obj bar.o sec egress
# tc filter show dev foo ingress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
# tc filter show dev foo egress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action
The ingress parent alias can also be used with ingress qdisc.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |
| |
| |
| |
| |
| |
| | |
Clean it up a bit, we can also get rid of some ugly ifdefs as in our case
TC_H_INGRESS is always defined.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |\ |
|
| | |
| | |
| | |
| | |
| | |
| | | |
In some cases passing file to monitor left file open.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix a whitespace in bpf_dump_error() usage, and also a missing closing
bracket in ntohl() macro for eBPF programs.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Since we have all infrastructure in place now, allow atomic live updates
on program arrays. This can be very useful e.g. in case programs that are
being tail-called need to be replaced, f.e. when classifier functionality
needs to be changed, new protocols added/removed during runtime, etc.
Thus, provide a way for in-place code updates, minimal example: Given is
an object file cls.o that contains the entry point in section 'classifier',
has a globally pinned program array 'jmp' with 2 slots and id of 0, and
two tail called programs under section '0/0' (prog array key 0) and '0/1'
(prog array key 1), the section encoding for the loader is <id/key>.
Adding the filter loads everything into cls_bpf:
tc filter add dev foo parent ffff: bpf da obj cls.o
Now, the program under section '0/1' needs to be replaced with an updated
version that resides in the same section (also full path to tc's subfolder
of the mount point can be passed, e.g. /sys/fs/bpf/tc/globals/jmp):
tc exec bpf graft m:globals/jmp obj cls.o sec 0/1
In case the program resides under a different section 'foo', it can also
be injected into the program array like:
tc exec bpf graft m:globals/jmp key 1 obj cls.o sec foo
If the new tail called classifier program is already available as a pinned
object somewhere (here: /sys/fs/bpf/tc/progs/parser), it can be injected
into the prog array like:
tc exec bpf graft m:globals/jmp key 1 fd m:progs/parser
In the kernel, the program on key 1 is being atomically replaced and the
old one's refcount dropped.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The recently introduced object pinning can be further extended in order
to allow sharing maps beyond tc namespace. F.e. maps that are being pinned
from tracing side, can be accessed through this facility as well.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make use of the new show_fdinfo() facility and verify that when a
pinned map is being fetched that its basic attributes are the same
as the map we declared from the ELF file. I.e. when placed into the
globalns, collisions could occur. In such a case warn the user and
bail out.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now that we have the possibility of sharing maps, it's time we get the
ELF loader fully working with regards to tail calls. Since program array
maps are pinned, we can keep them finally alive. I've noticed two bugs
that are being fixed in bpf_fill_prog_arrays() with this patch. Example
code comes as follow-up.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | | |
This reverts commit 8f80d450c3cb0996d839996807b77ca28bd4da09.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
since all tc classifiers are required to specify ethertype as part of grammar
By not allowing eth_type to be specified we remove contradiction for
example when a user specifies:
tc filter add ... priority xxx protocol ip flower eth_type ipv6
This patch removes that contradiction
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
gcc < 4.6 does not handle C11 syntax for the static initialization of
anonymous struct/union, hence the following error:
tc_bpf.c:260: error: unknown field map_type specified in initializer
Signed-off-by: Julien Floret <julien.floret@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |/
| |
| |
| |
| |
| |
| | |
When specifying a conntrack zone, the 'zone' keyword has to be used
before the actual zone index.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
| |\ |
|
| | |
| | |
| | |
| | | |
No blank lines at EOF, or trailing whitespace.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add missing spaces around operators to increase readability. Aside from
that, make "preference" match a real synonym for "tos" and "dsfield" as
it's effect was identical to them.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This fixes a few syntax errors and changes route filter help text to use
classid instead of flowid to be consistent with other filters' help
texts.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
After the patch, the most minimal command to load an eBPF action
for late binding with auto index selection through tc is:
tc actions add action bpf obj prog.o
We already set TC_ACT_PIPE in tc as default opcode, so if nothing
further has been specified, just use it. Also, allow "ok" next to
"pass" for matching cmdline on TC_ACT_OK.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When having optional classid, most minimal command can be sth
like:
tc filter add dev foo parent X: bpf obj prog.o
Therefore, adapt the code so that a next argument will not be
enforced as the case currently.
Also, minor cleanup on the classid, where we should rather
have used addattr32(), and add flags for exec configuration,
for example (using short notation):
tc filter add dev foo parent X: bpf da obj prog.o
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
|
|/
|
|
| |
Fix Coverity warning from dead code.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
linux-3.19 fq packet scheduler got a new attribute, controlling
number of 'flows' holding packets not attached to a socket
(forwarding usage)
kernel commit is 06eb395fa9856b5a87cf7d80baee2a0ed3cdb9d7
("pkt_sched: fq: better control of DDOS traffic")
This patch adds corresponding code to tc command.
tc qd replace dev eth0 root fq orphan_mask 511
Signed-off-by: Eric Dumazet <edumazet@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Support -timestamp and -tshort options for tc monitor like ip monitor.
# tc -tshort monitor
[2015-09-23T16:39:11.260555] qdisc fq 8003: dev eth0 root refcnt 2 limit
10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140
refill_delay 40.0ms
Signed-off-by: Eric Dumazet <edumazet@google.com>
|
|
|
|
|
|
|
|
| |
Code to parse and export this tuneable via netlink is already present in
sched_fq.c of the kernel, so not making it accessible for users would be
a waste of resources.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
| |
This patch follows the changes of commit 4d98ab0 ("Fix FSF address in
file headers"), fixing file headers added after it.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|\ |
|
| | |
|
|/
|
|
|
|
|
| |
Frontend support for kernel commit a5c90b29e5cc ("act_bpf: properly
support late binding of bpf action to a classifier").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Error was:
f_bpf.o: In function `bpf_parse_opt':
f_bpf.c:(.text+0x88f): undefined reference to `secure_getenv'
m_bpf.o: In function `parse_bpf':
m_bpf.c:(.text+0x587): undefined reference to `secure_getenv'
collect2: error: ld returned 1 exit status
There is no special reason to use the secure version of getenv, thus let's
simply use getenv().
CC: Daniel Borkmann <daniel@iogearbox.net>
Fixes: 88eea5395483 ("tc: {f,m}_bpf: allow to retrieve uds path from env")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The initializers are simply not needed.
These if-blocks are outright dead code, because '0 > unsigned' is always
false, so only else clause triggers and regardless of which clause triggers
it only updates 'ind' which is later unconditionally written to before
being used anyway.
Otherwise we get errors from clang:
m_pedit.c:166:8: error: comparison of 0 > unsigned expression is always false [-Werror,-Wtautological-compare]
if (0 > tkey->off) {
~ ^ ~~~~~~~~~
m_pedit.c:209:8: error: comparison of 0 > unsigned expression is always false [-Werror,-Wtautological-compare]
if (0 > tkey->off) {
~ ^ ~~~~~~~~~
2 errors generated.
Change-Id: I3c9e9092915088fc56f992e5df736851541a4458
|