Improving Hugepage Issue in Linux Kernel
Introducing Block Device Drivers: The Backbone of Modern Storage Systems
Understanding CPU isolation and the nohz_full feature
Data-Race Detection in the Linux Kernel: Exciting Developments with KCSAN!
Ever wondered about PCIe drivers and their significance in modern computing?
Understanding Live Patching in the Linux Kernel
Introducing Block Device Drivers: The Backbone of Modern Storage Systems
Developing Reliable Kernels: Leveraging gcov for Coverage Analysis
Optimizing Multi-Core Performance with Per-CPU Data Structures
𝐤𝐦𝐞𝐦𝐥𝐞𝐚𝐤: 𝐂𝐚𝐩𝐭𝐮𝐫𝐢𝐧𝐠 𝐌𝐞𝐦𝐨𝐫𝐲 𝐋𝐞𝐚𝐤𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐋𝐢𝐧𝐮𝐱 𝐊𝐞𝐫𝐧𝐞𝐥
𝐔𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 𝐂𝐏𝐔 𝐈𝐬𝐨𝐥𝐚𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐭𝐡𝐞 𝐧𝐨𝐡𝐳_𝐟𝐮𝐥𝐥 𝐅𝐞𝐚𝐭𝐮𝐫𝐞 🖥️
🔐 𝐀𝐝𝐝𝐫𝐞𝐬𝐬𝐢𝐧𝐠 𝐚 𝐂𝐫𝐢𝐭𝐢𝐜𝐚𝐥 𝐕𝐮𝐥𝐧𝐞𝐫𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐢𝐧 𝐊𝐞𝐫𝐧𝐞𝐥 𝐃𝐚𝐭𝐚 𝐇𝐚𝐧𝐝𝐥𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐩𝐭𝐫𝐚𝐜𝐞_𝐩𝐞𝐞𝐤_𝐬𝐢𝐠𝐢𝐧𝐟𝐨()
🚀 Delve into Kernel Debugging with KGDB! 🚀
📘 Book Recommendation: “THE DESIGN Of THE OPERATING SYSTEM” by Maurice J. Bach
📚 Delve into Linux Kernel design roots with “The Art of Linux Kernel Design”!

01

As many of you know, hugepages are essential for optimizing memory usage in high-performance applications like databases and scientific computing. They allow for larger memory blocks, reducing overhead and minimizing Translation Lookaside Buffer (TLB) misses. However, proper alignment when mapping huge pages is crucial to ensure smooth operation.

🔗 Patch notes and related discussions here: Linux Kernel Bugzilla Report https://lnkd.in/gz4zmJHe
🔗 Code changes – https://lnkd.in/guBtu6fF

For this reproduction, We 𝐮𝐭𝐢𝐥𝐢𝐳𝐞𝐝 𝐎𝐫𝐚𝐜𝐥𝐞 𝐕𝐌 𝐰𝐢𝐭𝐡 𝐔𝐛𝐮𝐧𝐭𝐮13.04 𝐛𝐢𝐧𝐚𝐫𝐢𝐞𝐬, 𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐚𝐥𝐥𝐲 𝐤𝐞𝐫𝐧𝐞𝐥 𝐯𝐞𝐫𝐬𝐢𝐨𝐧 3.8.0-19-𝐠𝐞𝐧𝐞𝐫𝐢𝐜.

Although this issue is quite old, it’s beneficial to reproduce it and apply the proposed fix to deepen our understanding of hugepage-related code changes and their workings.

This discussion is particularly valuable for novice kernel developers or engineers aspiring to build their careers in kernel and device driver development.

🛠️ 𝐓𝐡𝐞 𝐂𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞

A regression introduced in commit 40716e2 led to the kernel returning -EINVAL (invalid argument) unless the requested memory mapping length was “almost” aligned to a huge page boundary. This issue stemmed from moving alignment checks into hugetlb_file_setup() without adjusting the caller-side logic, leading to misalignment errors.

🔧 𝐓𝐡𝐞 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧

To address this, a patch partially reverts the previous changes and introduces necessary alignment logic back at the caller level. Key modifications include:

𝐊𝐞𝐲 𝐂𝐡𝐚𝐧𝐠𝐞𝐬:

1) 𝐡𝐮𝐠𝐞𝐭𝐥𝐛_𝐟𝐢𝐥𝐞_𝐬𝐞𝐭𝐮𝐩() 𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧:
Now takes an additional ‘addr’ parameter.
Aligns the size to hugepage boundaries within the function.

2) 𝐂𝐚𝐥𝐥𝐞𝐫-𝐬𝐢𝐝𝐞 𝐜𝐡𝐚𝐧𝐠𝐞𝐬:
mmap_pgoff() and newseg() now pass the address to hugetlb_file_setup().
Removes redundant alignment checks in these callers.

3) 𝐀𝐥𝐢𝐠𝐧𝐦𝐞𝐧𝐭 𝐜𝐚𝐥𝐜𝐮𝐥𝐚𝐭𝐢𝐨𝐧:
Uses ALIGN macro to ensure proper hugepage alignment.
Calculates the number of pages based on the aligned size.

4) 𝐈𝐦𝐩𝐚𝐜𝐭 𝐨𝐧 𝐬𝐡𝐦𝐠𝐞𝐭():
Previously, shmget() with SHM_HUGETLB flag only aligned to PAGE_SIZE.
Now, it aligns to the actual hugepage size.

5) 𝐅𝐢𝐥𝐞 𝐜𝐡𝐚𝐧𝐠𝐞𝐬:
mm/hugetlb.c: Main changes to hugetlb_file_setup().
include/linux/hugetlb.h: Updated function prototype.
ipc/shm.c: Modified newseg() to use the new hugetlb_file_setup().
mm/mmap.c: Updated mmap_pgoff to use the new function signature.

These enhancements not only resolve existing issues but also bolster the robustness of the kernel’s memory management for applications that depend on hugepages.

02

Are you ready to dive into the world of block device drivers? Let’s explore how these crucial components power our storage systems!

📽 YouTube link -> https://lnkd.in/gwae9g_F

🖥️ What Are Block Device Drivers?

Block device drivers, manage data in fixed-size chunks or blocks. These blocks ensure efficient reading and writing operations, making them ideal for storage devices.

💾 Why Are Block Drivers Important?

Most block devices are associated with storage systems, though they aren’t limited to this use. They ensure that data is read and written in fixed blocks, a critical feature for maintaining data integrity and performance in storage systems.

🔍 How Do Block Drivers Work?

VFS: Acts as an interface to the file system, providing a standard interface for different file systems to communicate with the kernel.

File System Interaction: Block drivers work closely with the file system. When an application needs to read or write data, it first interacts with the
Virtual File System (VFS), which then communicates with the file system. The file system, in turn, interacts with the block layer, which accesses the physical storage devices.

Page Cache: Data often goes through the page cache before reaching the file system, ensuring faster access and efficient data management.

Direct I/O (O_DIRECT): This mode allows bypassing the page cache, enabling direct read/write operations between the application and storage, crucial for specific high-performance applications.

Scheduling: Just like process scheduling, block drivers have their own schedulers to manage I/O requests. This is crucial for optimizing performance and meeting deadlines.

🔵 Blk_mq (Block Multi-Queue): A framework that improves scalability and performance by allowing multiple hardware and software queues.
IO Schedulers

Kyber: Designed for solid-state drives (SSDs), it aims to provide low latency and high throughput.
BFQ (Budget Fair Queueing): Focuses on providing fairness and responsiveness, making it suitable for desktops and interactive applications.
MQ Deadline: Ensures that I/O requests are serviced within a set deadline, balancing performance and latency.

📊 Block Size in Linux:

In Linux, the typical block size is 4KB, though hardware may use different sizes, commonly 512 bytes in modern systems.

03

𝐖𝐡𝐚𝐭 𝐢𝐬 𝐂𝐏𝐔 𝐈𝐬𝐨𝐥𝐚𝐭𝐢𝐨𝐧? 🤔

CPU Isolation is a Linux kernel feature that designates specific CPUs as isolated from general-purpose scheduling. Isolated CPUs do not handle regular workloads; instead, they focus on specialized tasks like real-time processing. This reduces interruptions and leads to lower latencies and more predictable performance.

𝐖𝐡𝐚𝐭 𝐢𝐬 𝐭𝐡𝐞 𝐧𝐨𝐡𝐳_𝐟𝐮𝐥𝐥 𝐅𝐞𝐚𝐭𝐮𝐫𝐞? ⏱️

The nohz_full feature allows certain CPUs to enter a “no-tick” mode, where they don’t receive timer interrupts. This is beneficial for low-latency scenarios, such as real-time applications, by reducing timer interrupt overhead and improving performance.

𝐇𝐨𝐰 𝐈𝐭 𝐖𝐨𝐫𝐤𝐬:

In a standard system, the kernel sends timer interrupts to all CPUs for scheduling. These can cause context switches and overhead.
With nohz_full enabled, CPUs in no-tick mode only receive timer interrupts when absolutely necessary, allowing tasks to execute uninterrupted.

𝐖𝐡𝐲 𝐧𝐨𝐡𝐳_𝐟𝐮𝐥𝐥 𝐈𝐧𝐭𝐫𝐨𝐝𝐮𝐜𝐞𝐝? 💡

The nohz_full feature addresses the needs of real-time and high-performance applications that require predictable latencies.

🛄𝐈𝐭 𝐚𝐢𝐦𝐬 𝐭𝐨:

𝐑𝐞𝐝𝐮𝐜𝐞 𝐋𝐚𝐭𝐞𝐧𝐜𝐲: Eliminating timer interrupts significantly lowers task latency.
𝐈𝐦𝐩𝐫𝐨𝐯𝐞 𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞: Applications like multimedia processing and gaming benefit from reduced overhead.
𝐒𝐮𝐩𝐩𝐨𝐫𝐭 𝐑𝐞𝐚𝐥-𝐓𝐢𝐦𝐞 𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬: It configures CPUs for real-time workloads.

𝐇𝐨𝐰 𝐂𝐏𝐔 𝐈𝐬𝐨𝐥𝐚𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐧𝐨𝐡𝐳_𝐟𝐮𝐥𝐥 𝐖𝐨𝐫𝐤 𝐓𝐨𝐠𝐞𝐭𝐡𝐞𝐫 🔗

When utilizing nohz_full, it is crucial to manage CPU isolation settings correctly:

𝐁𝐨𝐨𝐭 𝐂𝐏𝐔: The boot CPU (the CPU that initializes the kernel and starts the system) must not be included in the nohz_full mask. This is essential because the boot CPU handles critical tasks during the boot process, including initializing other CPUs and managing timers.

𝐇𝐨𝐮𝐬𝐞𝐤𝐞𝐞𝐩𝐢𝐧𝐠 𝐂𝐏𝐔𝐬: Are responsible for executing system tasks, such as timer management and scheduling. If the boot CPU is incorrectly isolated, it can lead to issues, such as kernel crashes during the boot process.

𝐑𝐞𝐥𝐚𝐭𝐞𝐝 𝐈𝐬𝐬𝐮𝐞𝐬 𝐚𝐧𝐝 𝐑𝐞𝐜𝐞𝐧𝐭 𝐂𝐡𝐚𝐧𝐠𝐞𝐬 🔧

A regression in the Linux kernel caused crashes when the boot CPU was included in the nohz_full mask.

𝐓𝐡𝐞 𝐫𝐞𝐜𝐞𝐧𝐭 𝐩𝐚𝐭𝐜𝐡:

𝐑𝐞𝐬𝐭𝐨𝐫𝐞𝐬 𝐂𝐡𝐞𝐜𝐤𝐬: Prevents the boot CPU from being included in the nohz_full mask for stability.

𝐔𝐩𝐝𝐚𝐭𝐞𝐬 𝐇𝐚𝐧𝐝𝐥𝐢𝐧𝐠 𝐨𝐟 𝐈𝐬𝐨𝐥𝐚𝐭𝐞𝐝 𝐂𝐏𝐔𝐬: Ensures the kernel can determine available CPUs without causing crashes.

𝐑𝐞𝐥𝐚𝐭𝐞𝐝 𝐋𝐢𝐧𝐤𝐬 𝐚𝐧𝐝 𝐅𝐮𝐫𝐭𝐡𝐞𝐫 𝐑𝐞𝐚𝐝𝐢𝐧𝐠📚
https://lnkd.in/gFAd6ypP

04

As technology continues to advance, the Linux kernel plays a critical role in powering our devices and systems. Ensuring its reliability is paramount, especially in today’s multi-core and multi-threaded computing environments. That’s where the Kernel Concurrency Sanitizer (KCSAN) comes into play.

💡 Why KCSAN?

The development of KCSAN was spurred by the need to address a common yet challenging problem in kernel development – concurrency-related bugs and data races. These bugs occur when multiple threads or CPUs concurrently access shared data without proper synchronization, leading to unpredictable and potentially incorrect behavior. Identifying and mitigating such issues is crucial for maintaining the stability and security of the Linux kernel.

🛠️ How KCSAN Addresses the Problem:

KCSAN is a powerful tool designed to detect and mitigate concurrency-related issues in the Linux kernel. Here’s how it does it:

Instrumentation: KCSAN instruments the kernel code to track memory accesses and locks acquired by different threads or CPUs.
Detection: As the kernel runs, KCSAN continuously monitors memory accesses and identifies potential data races by comparing access patterns among threads or CPUs.
Reporting: When KCSAN detects a potential issue, it generates detailed reports or warnings, helping kernel developers pinpoint and rectify the problem.
Testing: Kernel developers can use KCSAN during development and testing phases, running their test suites and real-world workloads with KCSAN enabled. This proactive approach ensures that concurrency-related bugs are discovered and addressed before they can impact end-users.

KCSAN is a testament to the commitment of the open-source community to enhance the robustness and reliability of the Linux kernel. It empowers developers to tackle complex concurrency issues head-on, making the kernel more resilient and less prone to crashes or incorrect behavior in today’s demanding computing environments.

👏 Kudos to the developers and contributors behind KCSAN for their dedication to improving the core of open-source technology! Let’s continue to support and invest in tools like KCSAN to ensure that the Linux kernel remains a rock-solid foundation for innovation.

Gratitude to LPC team for spotlighting Data-Race Detection in the Linux Kernel! Sharing the insightful presentation covered by LPC

📺 YouTube session recording: https://lnkd.in/gFJQAVj9 👀📹

05

🔍 What is a PCIe driver?

PCIe (Peripheral Component Interconnect Express) drivers are software components that facilitate communication between the operating system and PCIe devices connected to a computer’s motherboard. These drivers enable the efficient transfer of data between the CPU and PCIe devices, ensuring seamless functionality.

💡 Why use PCIe drivers?

PCIe drivers play a crucial role in maximizing the performance of PCIe devices such as network cards, graphics cards, SSDs, and more. By providing a standardized interface for communication, these drivers enable high-speed data transfer, low latency, and optimal resource utilization, enhancing overall system efficiency.

📅 When to use PCIe drivers?

PCIe drivers are essential whenever PCIe devices are integrated into a computer system. Whether you’re building a gaming rig, configuring a server, or developing embedded systems, PCIe drivers are necessary to ensure proper device recognition, configuration, and operation.

🛠️ Understanding PCIe configuration space

PCIe configuration space refers to a set of registers within PCIe devices that contain crucial information about device capabilities, status, and control settings. PCIe drivers interact with this configuration space to initialize devices, allocate resources, and manage device operations effectively.

PCIe drivers are indispensable components in modern computing, enabling seamless communication and optimal performance for PCIe devices. Understanding their role and importance is key to harnessing the full potential of PCIe technology.

+++++++++++++++++++++++++++++++++++++++++++++++++

In this video, you’ll learn step-by-step how to access PCIe configuration space using a Linux kernel module. From understanding PCIe basics to implementing code to interact with PCIe configuration registers, this tutorial provides a hands-on approach to exploring PCIe functionality within the Linux environment.

https://lnkd.in/gy_bE3vg

Whether you’re a beginner looking to understand PCIe or an experienced developer seeking to enhance your skills, this tutorial is for you. Start your journey into PCIe configuration with Linux today!

06

Live patching is an essential feature in Linux systems, enabling the application of updates and security patches to a running kernel without requiring a reboot. This ensures uninterrupted system uptime, crucial for high-availability environments.

📽 Session Link – https://lnkd.in/gMj2Z3uD

🚀 What is Live Patching?

Live patching involves applying updates to a running Linux kernel without stopping it. This is critical for maintaining system security and performance.

📋 Steps to Implement Live Patching –

🔗 Register the Kernel Live Patch Module:

Before applying patches, register the live patch module with the kernel to prepare the system for changes.

🛠️ Enable the Live Patching Feature:

Activate the patch by enabling live patching. This redirects execution to the updated functions instead of the original ones.

❌ Disable and Unregister Patching:

You can disable live patching to revert to the original functions. To remove the patch, unregister it to clean up the system.

🧩 How Live Patching Works

When a patch is applied, it is registered but not immediately active. The registration sets up the environment for the patch. The patch becomes active once enabled, rerouting calls to the new functions.

For instance, registering a patch makes it visible in /sys/kernel/livepatch/.

The kernel continues using the old functions until the patch is enabled. Enabling the patch redirects the system to the new functionality.

⚙️ Mechanism of Live Patching

Live patching uses technical steps to switch between old and new functions seamlessly:

⛔ NOP Instruction: The kernel uses a ‘NOP’ (No Operation) instruction before critical functions, allowing changes to be introduced without immediate effect.

🔀 Function Redirection: Upon enabling live patching, the system replaces the NOP with a jump to the new function, bypassing the old code.
Disabling the patch reverts the kernel to its original state, executing the old functions. You can also remove the patch if it’s no longer needed.

🌟 Benefits & 🚩 Drawbacks of Live Patching

🔵 Advantages:

🕒 Zero Downtime: Eliminates the need for system downtime, crucial for continuously available systems.
🔒 Immediate Security Updates: Allows immediate application of security patches, ensuring prompt vulnerability resolution.

🔵 Disadvantages:

📈 Increased Kernel Size: The kernel may become larger, leading to slight performance overhead.
🔍 Traceable Functions Required: Live patching relies on functions marked for tracing. Unmarked functions can’t be patched, limiting live patching’s scope.

🛠️ Tools for Live Patching

Several tools support live patching including:

🌐 KernelCare: Provides automatic live patching across multiple Linux distributions.
🖥️ kGraft: Developed by SUSE for live patching.
🔧 kpatch: Used by Red Hat for live kernel patching.
☁️ Ksplice: Supports live patching on Oracle Linux.