ucore OS | 实验 1 ucore 的启动过程

练习 1

操作系统镜像文件 ucore.img 是如何一步一步生成的？

通过执行命令 make V=""，使 make 显示出编译过程中执行的命令：

+ cc kern/init/init.c
gcc -Ikern/init/ -fno-builtin -fno-PIC -Wall -ggdb -m32 -gstabs -nostdinc  -fno-stack-protector -Ilibs/ -Ikern/debug/ -Ikern/driver/ -Ikern/trap/ -Ikern/mm/ -c kern/init/init.c -o obj/kern/init/init.o
// 省略相似命令……
+ ld bin/kernel
ld -m    elf_i386 -nostdlib -T tools/kernel.ld -o bin/kernel  obj/kern/init/init.o obj/kern/libs/stdio.o obj/kern/libs/readline.o obj/kern/debug/panic.o obj/kern/debug/kdebug.o obj/kern/debug/kmonitor.o obj/kern/driver/clock.o obj/kern/driver/console.o obj/kern/driver/picirq.o obj/kern/driver/intr.o obj/kern/trap/trap.o obj/kern/trap/vectors.o obj/kern/trap/trapentry.o obj/kern/mm/pmm.o  obj/libs/string.o obj/libs/printfmt.o
+ cc boot/bootasm.S
gcc -Iboot/ -fno-builtin -fno-PIC -Wall -ggdb -m32 -gstabs -nostdinc  -fno-stack-protector -Ilibs/ -Os -nostdinc -c boot/bootasm.S -o obj/boot/bootasm.o
// 省略相似命令……
gcc -g -Wall -O2 obj/sign/tools/sign.o -o bin/sign
+ ld bin/bootblock
ld -m    elf_i386 -nostdlib -N -e start -Ttext 0x7C00 obj/boot/bootasm.o obj/boot/bootmain.o -o obj/bootblock.o
'obj/bootblock.out' size: 504 bytes
build 512 bytes boot sector: 'bin/bootblock' success!
dd if=/dev/zero of=bin/ucore.img count=10000
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB, 4.9 MiB) copied, 0.0156548 s, 327 MB/s
dd if=bin/bootblock of=bin/ucore.img conv=notrunc
1+0 records in
1+0 records out
512 bytes copied, 7.1747e-05 s, 7.1 MB/s
dd if=bin/kernel of=bin/ucore.img seek=1 conv=notrunc
154+1 records in
154+1 records out
79036 bytes (79 kB, 77 KiB) copied, 0.000255015 s, 310 MB/s

可以看出，生成镜像主要有以下步骤：

使用 gcc 编译 kern 目录和 boot 下的所有 .c C 语言文件和 .S 汇编代码，可以看到比起平时的编译多出了很多选项：
- -fno-builtin：表示防止 gcc 使用自带的内置函数，比如用到的 strcpy，如果没有这个选项，gcc 会跳过我们的代码，使用它自带的 strcpy 函数。
- -nostdinc：表示不要在系统自带的标准库的目录下搜索包含文件，只在 -I 指定的目录下搜索，这也是为了防止我们自定义的 strcpy 之类的标准库函数和系统自带的产生冲突。
- -fno-PIC：表示不要生成 PIC (Position Independent Code)。经过查资料得知，生成 PIC 会使代码体积变大，在 https://github.com/chyyuu/ucore_os_lab/pull/22 可以知道，由于新版本 gcc 默认启用 PIC，所以会导致生成的 bootloader 大于 512 字节，无法放进一个扇区中。并且， PIC 需要 bootloader 正确处理重定位，而 uCore 的 bootloader 不支持处理重定位，在后续实验中也会产生问题。
- -fno-stack-protector：同样是为了减小代码体积。启用栈保护的话在函数调用时会增加额外的代码。
- -Wall：表示打印出所有警告，可以帮我们尽早地发现可能出现的问题。
- -gstabs 和 -ggdb ：表示生成调试信息，帮助我们后续使用 gdb 进行调试。
- -m32：生成 32 位代码。
- -c：只编译不链接。后面由我们自己使用 ld 手动链接。
值得注意的是，在编译 bootloader 的时候还增加了一个额外的选项：
- -Os：意思是指示编译器尽可能地优化代码体积，因为 bootloader 在去掉启动标识符和分区表之后只有 466 字节可用，因此需要尽可能小。

使用 ld 进行链接，将目标文件连接成为可执行文件。一个是 kernel 是操作系统的内核，一个是 bootloader 用来加载我们编写的操作系统。

链接内核的命令是：

ld -m elf_i386 -nostdlib -T tools/kernel.ld -o bin/kernel obj/kern/init/init.o ...

-m elf_i386 表示模拟 32 位体系。
-nostdlib 不链接标准库。
-T tools/kernel.ld 使用 tools/kernel.ld 链接脚本进行链接。

可以看到 tools/kernel.ld 脚本如下：

/* Simple linker script for the JOS kernel.
   See the GNU ld 'info' manual ("info ld") to learn the syntax. */

OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
OUTPUT_ARCH(i386)
ENTRY(kern_init)

SECTIONS {
    /* Load the kernel at this address: "." means the current address */
    . = 0x100000;

    .text : {
        *(.text .stub .text.* .gnu.linkonce.t.*)
    }

    PROVIDE(etext = .); /* Define the 'etext' symbol to this value */

    .rodata : {
        *(.rodata .rodata.* .gnu.linkonce.r.*)
    }

    /* Include debugging information in kernel memory */
    .stab : {
        PROVIDE(__STAB_BEGIN__ = .);
        *(.stab);
        PROVIDE(__STAB_END__ = .);
        BYTE(0)     /* Force the linker to allocate space
                   for this section */
    }

    .stabstr : {
        PROVIDE(__STABSTR_BEGIN__ = .);
        *(.stabstr);
        PROVIDE(__STABSTR_END__ = .);
        BYTE(0)     /* Force the linker to allocate space
                   for this section */
    }

    /* Adjust the address for the data segment to the next page */
    . = ALIGN(0x1000);

    /* The data segment */
    .data : {
        *(.data)
    }

    PROVIDE(edata = .);

    .bss : {
        *(.bss)
    }

    PROVIDE(end = .);

    /DISCARD/ : {
        *(.eh_frame .note.GNU-stack)
    }
}

比较重要的部分有：

指定了程序起始地址为 0x100000。
导出了 etext、edata 和 end 地址供后续 C 语言程序使用，以便能够正确初始化内存。
加入内核调试信息。
而链接 bootloader 的命令是：

ld -m elf_i386 -nostdlib -N -e start -Ttext 0x7C00 obj/boot/bootasm.o obj/boot/bootmain.o -o obj/bootblock.o

和链接内核的有少许不同：

-N：将 text 段设置为可读可写，不对齐 data 段，不链接动态库。
-e：程序的入口是 start 标签。
-Ttext 0x7C00：程序的 text 段从 0x7C00 开始，因为 BIOS 会将 bootloader 放在 0x7C00 处开始执行。

对 bootloader 签名，使得操作系统可以正常引导。即在 bootloader 第 511、512 个字节处要写入 0x55 和 0xAA。
使用 dd 命令生成镜像。
1. 首先生成一个 10000 个扇区的全零的镜像。
2. 然后在第一个扇区（扇区 0）写入大小为 512 字节的 bootloader。
3. 然后在 bootloader 之后（扇区 1）写入真正的操作系统内核。

一个被系统认为是符合规范的硬盘主引导扇区的特征是什么？
- 大小为 512 字节。
- 最后两个字节为0x55 和 0xAA。
- 由不超过 466 字节的启动代码和不超过 64 字节的硬盘分区表加上两个字节的结束符 0x55AA 组成。

练习 2

从 CPU 加电后执行的第一条指令开始，单步跟踪 BIOS 的执行。

操作步骤：
1. 执行 make 命令生成可启动镜像。
2. 输入命令 qemu-system-i386 -S -s -parallel stdio -hda ./bin/ucore.img -serial null 启动 qemu 模拟器。
  - -s 表示在 tcp::1234 打开 GDB 调试服务器。
  - -S 代表开机后暂停 CPU 执行，只有用户输入命令之后才开始执行。
  - -hda 指定了硬盘的镜像。
  - -parallel 和 -serial 分别指定并行口和串行口的输出重定向位置。
  可以看到 qemu 启动后没有继续加载，而是暂停了 CPU 的执行。
3. 启动 gdb，输入 target remote :1234 连接到 qemu 准备调试，并且输入 set architecture i8086 指示 qemu 现在的可执行代码是 16 位的，因为 BIOS 代码是 16 位的。
4. 准备完成之后。
5. 输入命令
```
define hook-stop
x/i $pc
end
```
  在每次暂停执行代码时强制反汇编当前命令。
  
  输入 stepi 开始单步跟踪 BIOS 执行过程，运行结果如下图所示：
在初始化位置 0x7c00 设置实地址断点，测试断点正常。
1. 按照上面的操作启动 qemu 和 gdb 调试之后，输入 break *0x7c00 在地址 0x7c00 处设下断点。
2. 然后输入 continue 让 CPU 正常执行。
3. 可以看到，当 CPU 执行到 0x7c00 处时，遇到了断点，自动暂停执行。此时输入 x/i $pc 进行反汇编，正是 bootloader.S 的第一条指令 cli 。
在调用 qemu 时增加 -d in_asm -D q.log 参数，便可以将运行的汇编指令保存在 q.log 中。将执行的汇编代码与 bootasm.S 和 bootblock.asm 进行比较，看看二者是否一致。
1. 执行命令 qemu-system-i386 -d in_asm -D q.log -S -s -parallel stdio -hda ./bin/ucore.img -serial null 启动 qemu 并记录执行的汇编代码到 q.log 中。
2. 执行命令后，用文本编辑器打开 q.log 文件和 bootasm.S 进行比较，定位到地址 0x00007c00 处可以发现它们是一致的。

练习 3

BIOS 将通过读取硬盘主引导扇区到内存，并转跳到对应内存中的位置执行 bootloader。请分析 bootloader 是如何完成从实模式进入保护模式的。

#include <asm.h>

# Start the CPU: switch to 32-bit protected mode, jump into C.
# The BIOS loads this code from the first sector of the hard disk into
# memory at physical address 0x7c00 and starts executing in real mode
# with %cs=0 %ip=7c00.

.set PROT_MODE_CSEG,        0x8                     # kernel code segment selector
.set PROT_MODE_DSEG,        0x10                    # kernel data segment selector
.set CR0_PE_ON,             0x1                     # protected mode enable flag

# start address should be 0:7c00, in real mode, the beginning address of the running bootloader
.globl start
start:
.code16                                             # Assemble for 16-bit mode
    cli                                             # Disable interrupts
    cld                                             # String operations increment

    # Set up the important data segment registers (DS, ES, SS).
    xorw %ax, %ax                                   # Segment number zero
    movw %ax, %ds                                   # -> Data Segment
    movw %ax, %es                                   # -> Extra Segment
    movw %ax, %ss                                   # -> Stack Segment

    # Enable A20:
    #  For backwards compatibility with the earliest PCs, physical
    #  address line 20 is tied low, so that addresses higher than
    #  1MB wrap around to zero by default. This code undoes this.
seta20.1:
    inb $0x64, %al                                  # Wait for not busy(8042 input buffer empty).
    testb $0x2, %al
    jnz seta20.1

    movb $0xd1, %al                                 # 0xd1 -> port 0x64
    outb %al, $0x64                                 # 0xd1 means: write data to 8042's P2 port

seta20.2:
    inb $0x64, %al                                  # Wait for not busy(8042 input buffer empty).
    testb $0x2, %al
    jnz seta20.2

    movb $0xdf, %al                                 # 0xdf -> port 0x60
    outb %al, $0x60                                 # 0xdf = 11011111, means set P2's A20 bit(the 1 bit) to 1

    # Switch from real to protected mode, using a bootstrap GDT
    # and segment translation that makes virtual addresses
    # identical to physical addresses, so that the
    # effective memory map does not change during the switch.
    lgdt gdtdesc
    movl %cr0, %eax
    orl $CR0_PE_ON, %eax
    movl %eax, %cr0

    # Jump to next instruction, but in 32-bit code segment.
    # Switches processor into 32-bit mode.
    ljmp $PROT_MODE_CSEG, $protcseg

.code32                                             # Assemble for 32-bit mode
protcseg:
    # Set up the protected-mode data segment registers
    movw $PROT_MODE_DSEG, %ax                       # Our data segment selector
    movw %ax, %ds                                   # -> DS: Data Segment
    movw %ax, %es                                   # -> ES: Extra Segment
    movw %ax, %fs                                   # -> FS
    movw %ax, %gs                                   # -> GS
    movw %ax, %ss                                   # -> SS: Stack Segment

    # Set up the stack pointer and call into C. The stack region is from 0--start(0x7c00)
    movl $0x0, %ebp
    movl $start, %esp
    call bootmain

    # If bootmain returns (it shouldn't), loop.
spin:
    jmp spin

# Bootstrap GDT
.p2align 2                                          # force 4 byte alignment
gdt:
    SEG_NULLASM                                     # null seg
    SEG_ASM(STA_X|STA_R, 0x0, 0xffffffff)           # code seg for bootloader and kernel
    SEG_ASM(STA_W, 0x0, 0xffffffff)                 # data seg for bootloader and kernel

gdtdesc:
    .word 0x17                                      # sizeof(gdt) - 1
    .long gdt                                       # address gdt

启动过程如下：

关闭中断，清空主要寄存器。
打开 A20 模式，通过控制 8042 键盘控制器实现。
设置全局描述符，将全局描述符表的地址和大小加载到寄存器中。
设置 CR0 寄存器 $CR0_PE_ON 位，启动保护模式。
然后在 32 位模式下，设置好段寄存器和栈寄存器，跳转到 bootmain 也就是加载内核 ELF 的程序。

练习 4

分析 bootloader 加载 ELF 格式的 OS 的过程。

bootloader 是如何读取硬盘扇区的？

bootloader 主要通过 readsect() 和 waitdisk() 两个函数进行硬盘扇区的读取。

static void
readsect(void *dst, uint32_t secno) {
   // wait for disk to be ready
   waitdisk();

   outb(0x1F2, 1);                         // count = 1
   outb(0x1F3, secno & 0xFF);
   outb(0x1F4, (secno >> 8) & 0xFF);
   outb(0x1F5, (secno >> 16) & 0xFF);
   outb(0x1F6, ((secno >> 24) & 0xF) | 0xE0);
   outb(0x1F7, 0x20);                      // cmd 0x20 - read sectors

   // wait for disk to be ready
   waitdisk();

   // read a sector
   insl(0x1F0, dst, SECTSIZE / 4);
}

首先等待磁盘控制器准备好，然后写入控制信息，0x1F2 是要读取的扇区数，0x1F3 到 0x1F6 存储着要进行操作的扇区号码（小端存储），然后向 0x1F7 发送读取指令。等磁盘再次就绪后，通过反复调用 insl 指令 SECTSIZE / 4 次，将磁盘内容以 4 字节为一组读取到内存中。

基于这两个函数，bootloader 还有函数 readseg() 包装了一下基本操作，用以读取多个扇区。

static void
readseg(uintptr_t va, uint32_t count, uint32_t offset) {
   uintptr_t end_va = va + count;

   // round down to sector boundary
   va -= offset % SECTSIZE;

   // translate from bytes to sectors; kernel starts at sector 1
   uint32_t secno = (offset / SECTSIZE) + 1;

   // If this is too slow, we could read lots of sectors at a time.
   // We'd write more to memory than asked, but it doesn't matter --
   // we load in increasing order.
   for (; va < end_va; va += SECTSIZE, secno ++) {
       readsect((void *)va, secno);
   }
}

函数的作用是从磁盘的 offset 字节处开始读取 count 个字节到 va 指向的虚拟地址中。

bootloader 是如何加载 ELF 格式的 OS？

void
bootmain(void) {
   // read the 1st page off disk
   readseg((uintptr_t)ELFHDR, SECTSIZE * 8, 0);

   // is this a valid ELF?
   if (ELFHDR->e_magic != ELF_MAGIC) {
       goto bad;
   }

   struct proghdr *ph, *eph;

   // load each program segment (ignores ph flags)
   ph = (struct proghdr *)((uintptr_t)ELFHDR + ELFHDR->e_phoff);
   eph = ph + ELFHDR->e_phnum;
   for (; ph < eph; ph ++) {
       readseg(ph->p_va & 0xFFFFFF, ph->p_memsz, ph->p_offset);
   }

   // call the entry point from the ELF header
   // note: does not return
   ((void (*)(void))(ELFHDR->e_entry & 0xFFFFFF))();

bad:
   outw(0x8A00, 0x8A00);
   outw(0x8A00, 0x8E00);

   /* do nothing */
   while (1);
}

struct elfhdr {
   uint32_t e_magic;     // must equal ELF_MAGIC
   uint8_t e_elf[12];
   uint16_t e_type;      // 1=relocatable, 2=executable, 3=shared object, 4=core image
   uint16_t e_machine;   // 3=x86, 4=68K, etc.
   uint32_t e_version;   // file version, always 1
   uint32_t e_entry;     // entry point if executable
   uint32_t e_phoff;     // file position of program header or 0
   uint32_t e_shoff;     // file position of section header or 0
   uint32_t e_flags;     // architecture-specific flags, usually 0
   uint16_t e_ehsize;    // size of this elf header
   uint16_t e_phentsize; // size of an entry in program header
   uint16_t e_phnum;     // number of entries in program header or 0
   uint16_t e_shentsize; // size of an entry in section header
   uint16_t e_shnum;     // number of entries in section header or 0
   uint16_t e_shstrndx;  // section number that contains section name strings
};

首先函数加载 ELF 的头部到内存中，判断魔数是否合法，如果合法，则开始读取 ELF 中的每一段到内存中的相应位置。

struct proghdr {
   uint32_t p_type;   // loadable code or data, dynamic linking info,etc.
   uint32_t p_offset; // file offset of segment
   uint32_t p_va;     // virtual address to map segment
   uint32_t p_pa;     // physical address, not used
   uint32_t p_filesz; // size of segment in file
   uint32_t p_memsz;  // size of segment in memory (bigger if contains bss）
   uint32_t p_flags;  // read/write/execute bits
   uint32_t p_align;  // required alignment, invariably hardware page size
};

ELF 段的信息由 proghdr 结构指示，第一个 proghdr 位于 e_phoff 偏移量，数量则是 e_phnum，其中 p_va 指示段在内存中的虚拟地址，p_memsz 标志了段的大小，p_offset 则是相对于 ELF 文件的偏移量。

将每一段都读取到了内存中的相应位置之后，就跳转到 ELF 头部中 e_entry 所指示的入口地址执行，控制权移交给操作系统。

练习 5

实现函数调用堆栈跟踪函数 print_stackframe()。

通过附录得知，在每个函数体的开始，编译器都会插入如下汇编代码：

pushl   %ebp
movl    %esp, %ebp

这样在程序执行到一个函数的实际指令前，已经有以下数据顺序入栈：参数、返回地址、ebp 寄存器。由此得到类似如下的栈结构（以 cdecl 方式为例）：

+|    栈底方向     |  高位地址
 |      ....       |
 |      ....       |
 |     参数 3      |
 |     参数 2      |
 |     参数 1      |
 |    返回地址     |
 |   上一层[ebp]   |  <-------- [ebp]
 |    局部变量     |  低位地址

此时 ebp 处于非常重要的地位，因为给 ebp 赋值之前，原 ebp 值已经被压栈（位于栈顶），而新的 ebp 又恰恰指向栈顶。从函数堆栈结构可以知道，通过 ebp 寄存器中的地址向上可以获得返回地址和参数，向下可以获取到局部变量，通过它本身可以获取到上一层函数调用的 ebp 地址。

根据这个结构，就可以写出显示函数调用堆栈的函数如下了：

void
print_stackframe(void) {
     uint32_t ebp = read_ebp(), eip = read_eip();

     for (int i = 0; i < STACKFRAME_DEPTH && ebp != 0; ++i) {
        uint32_t *args = (uint32_t *)ebp + 2;
        cprintf("ebp:0x%08x eip:0x%08x args:0x%08x 0x%08x 0x%08x 0x%08x", 
                 ebp,       eip,       args[0], args[1], args[2], args[3]);
        cprintf("\n");
        print_debuginfo(eip - 1);
        eip = *(uint32_t*) (ebp + 4);
        ebp = *(uint32_t*) ebp;
     }
}

需要注意的是在遇到 ebp == 0 的情况下，已经没有上一层的函数调用了，应该终止调用。

完成函数编写之后，执行 make qemu 查看执行结果： Screenshot_20190325_095912

输出最后一层是函数调用的最深层，也就是第一个被调用的函数 kern_init() 因为栈地址是从 0x7c00 开始，在从 bootmain() 函数调用到内核初始化函数 kern_init() 时有 call 指令，0x7bfc 地址存放的是返回地址，而编译器在函数体的头部都会插入：

pushl   %ebp
movl    %esp, %ebp

通过查看 kernel.asm 也可以确认： Screenshot_20190325_101758 于是此时栈顶地址变为 0x7bf8，而这个地址赋值给了 ebp 寄存器，我们就看到了 ebp 寄存器的内容是 0x7bf8 了。这个函数调用没有参数，后面的参数没有意义。

练习 6

完善中断初始化和处理。

中断向量表中一个表项占多少字节？其中哪几位代表中断处理代码的入口？

查看 struct gatedesc 定义如下：

struct gatedesc {
   unsigned gd_off_15_0 : 16;        // low 16 bits of offset in segment
   unsigned gd_ss : 16;              // segment selector
   unsigned gd_args : 5;             // # args, 0 for interrupt/trap gates
   unsigned gd_rsv1 : 3;             // reserved(should be zero I guess)
   unsigned gd_type : 4;             // type(STS_{TG,IG32,TG32})
   unsigned gd_s : 1;                // must be 0 (system)
   unsigned gd_dpl : 2;              // descriptor(meaning new) privilege level
   unsigned gd_p : 1;                // Present
   unsigned gd_off_31_16 : 16;       // high bits of offset in segment
};

可以看到一个 struct gatedesc 占用 8 字节，其中低 2 字节和高 2 字节拼接成段偏移的低 16 位和高 16 位，2-3 字节则是段选择子。

请编程完善 kern/trap/trap.c 中对中断向量表进行初始化的函数 idt_init。在 idt_init 函数中，依次对所有中断入口进行初始化。使用 mmu.h 中的 SETGATE 宏，填充 idt 数组内容。注意除了系统调用中断 (T_SYSCALL)以外，其它中断均使用中断门描述符，权限为内核态权限；而系统调用中断使用异常，权限为用户态权限。每个中断的入口由 tools/vectors.c 生成，使用 trap.c 中声明的 vectors 数组即可。

代码如下：
```
void
idt_init(void) {
   extern uintptr_t __vectors[];
   for (int i = 0; i < sizeof(idt) / sizeof(struct gatedesc); ++i) {
       SETGATE(idt[i], 0, GD_KTEXT, __vectors[i], DPL_KERNEL);
   }

   SETGATE(idt[T_SYSCALL], 1, GD_KTEXT, __vectors[T_SYSCALL], DPL_USER);
   SETGATE(idt[T_SWITCH_TOK], 0, GD_KTEXT, __vectors[T_SWITCH_TOK], DPL_USER);

   lidt(&idt_pd);
}
```
使用 SETGATE 宏设置好中断描述表，调用 lidt 命令将其加载，以后遇到中断时就会进入我们的中断处理程序。
请编程完善 trap.c 中的中断处理函数 trap()，在对时钟中断进行处理的部分填写 trap() 函数中处理时钟中断的部分，使操作系统每遇到 100 次时钟中断后，调用 print_ticks() 子程序，向屏幕上打印一行文字“100 ticks”。可以知道定时器中断的中断号是 IRQ_TIMER ，所以在 switch 块中找到 IRQ_TIMER 填入相关代码。代码很简单：
```
static void
trap_dispatch(struct trapframe *tf) {
   char c;
   switch (tf->tf_trapno) {
   case IRQ_OFFSET + IRQ_TIMER:
       if ((++ticks) % TICK_NUM == 0) print_ticks();
       break;
   // ...
}
```

完成代码后，执行 make qemu ，可以看到大约每隔一段时间就会输出一行 “100 ticks”，同时屏幕上会回显键盘按键。 Screenshot_20190325_110047

扩展练习

这一部分的代码注释比较少，需要自己理解的部分较多。

由于切换特权级别的操作需要通过更改 CS 段寄存器来实现，但是 CS 段寄存器不能直接修改。在中断发生时，硬件会自动将 CS 、EIP、EFLAGS 和 ERRCODE 压栈，中断返回时从栈中恢复这些寄存器的值。因此，必须通过 iret 指令修改 CS 段寄存器中的值，也就是在中断处理程序中，必须更改 struct trapframe 中的 tf_cs 字段才能实现特权级的转换。对于 SS 段选择子寄存器而言也是一样的，要更改它的值必须通过修改 struct trapframe 来实现。

需要注意的是 iret 指令在不同情况下表现是不同的。查阅资料得：

if(OperandSize == 32) {
    if(!IsWithinStackLimits(TopStackBytes(12)) Exception(SS); //top 12 bytes of stack not within stack limits
    TemporaryEIP = Pop();
    TemporaryCS = Pop();
    TemporaryEFLAGS = Pop();
}
//Protected mode return
//PE == 1, VM == 0 in flags image
if(ReturnCode.SegmentSelector == 0) Exception(GP(0));
if(!IsWithinDescriptorTableLimits(ReturnCode.SegmentSelector.AddressesDescriptor)) Exception(GP(Selector));
ReturnCode.SegmentDescriptor = ReadSegmentDescriptor(ReturnCode.SegmentSelector);
if(!IsCodeSegment(ReturnCode.SegmentDescriptor)) Exception(GP(Selector));
if(ReturnCode.SegmentSelector.RPL < CPL) Exception(GP(Selector));
if(IsConforming(ReturnCode.SegmentDescriptor) && ReturnCode.Segment.DPL > ReturnCode.SegmentSelector.RPL) Exception(GP(Selector));
if(ReturnCode.SegmentSelector.RPL > CPL) {
    //Return to outer privilege level
    if(OperandSize == 32 && if(!IsWithinStackLimits(TopStackBytes(8)) Exception(SS(0)); //top 8 bytes of stack not within stack limits
    else /*OperandSize == 16*/ if(!IsWithinStackLimits(TopStackBytes(4)) Exception(SS(0)); //top 4 bytes of stack not within stack limits
    StackSegmentSelector = ReadReturnSegmentSelector();
    if(StackSegmentSelector == 0) Exception(GP(0));
    if(!IsWithinDescriptorTableLimits(ReturnStackSegmentSelector.Index)) Exception(GP(SSSelector));
    SegmentDescriptor = ReadSegmenDescriptor(ReturnSegmentSelector);
    if(StackSegmentSelector.RPL != ReturnCode.SegmentSelector.RPL) {
        if(StackSegmentSelector.RPL != ReturnCode.SegmentSelector.RPL || !IndicatesWritableDataSegment(StackSegmentDescriptor) || StackSegment.DPL != ReturnCode.SegmentSelector.RPL) Exception(GP(SSSelector));
        if(!IsPresent(StackSegment)) Exception(SS(SSSelector));
    }
    if(!IsWithinCodeSegmentLimit(TemporaryEIP)) Exception(GP(0));
    EIP = TemporaryIP;
    CS = TemporaryCS;
    EFLAGS.CF = TemporaryEFLAGS.CF;
    EFLAGS.PF = TemporaryEFLAGS.PF;
    EFLAGS.AF = TemporaryEFLAGS.ZF;
    EFLAGS.SF = TemporaryEFLAGS.SF;
    EFLAGS.TF = TemporaryEFLAGS.DF;
    EFLAGS.OF = TemporaryEFLAGS.OF;
    EFLAGS.NT = TemporaryEFLAGS.NT;
    if(OperandSize == 32) {
        EFLAGS.RF = TemporaryEFLAGS.RF;
        EFLAGS.AC = TemporaryEFLAGS.AC;
        EFLAGS.ID = TemporaryEFLAGS.ID;
    }
    if(CPL <= IOPL) EFLAGS.IF = TemporaryEFLAGS.IF;
    if(CPL == 0) {
        EFLAGS.IOPL = TemporaryEFLAGS.IOPL;
        if(OperandSize == 32) {
            EFLAGS.VM = TemporaryEFLAGS.VM;
            EFLAGS.VIF = TemporaryEFLAGS.VIF;
            EFLAGS.VIP = TemporaryEFLAGS.VIP;
        }
    }
    //perform operation for each of the segment registers
    SegmentRegisters[] = {ES, FS, GS, DS};
    while(SegmentRegister = SegmentRegisters.Next()) if((PointsToDate(SegmentRegister) || !IsConformingCodeSegment(SegmentRegister)) && CPL > SegmentDescriptor.DPL /*stored in hidden part of segment register*/) SegmentSelector = 0; //segment register invalid; null segment selector
    //END
} else {
    //Same privilege level
    //PE=1, VM=0 in flags image, RPL=CPL
    if(!IsWithinCodeSegmentLimits(EIP)) Exception(GP(0));
    EIP = TemporaryEIP;
    CS = TemporaryCS; //segment descriptor information also loaded
    EFLAGS.CF = TemporaryEFLAGS.CF;
    EFLAGS.PF = TemporaryEFLAGS.PF;
    EFLAGS.AF = TemporaryEFLAGS.ZF;
    EFLAGS.SF = TemporaryEFLAGS.SF;
    EFLAGS.TF = TemporaryEFLAGS.DF;
    EFLAGS.OF = TemporaryEFLAGS.OF;
    EFLAGS.NT = TemporaryEFLAGS.NT;
    if(OperandSize == 32) {
        EFLAGS.RF = TemporaryEFLAGS.RF;
        EFLAGS.AC = TemporaryEFLAGS.AC;
        EFLAGS.ID = TemporaryEFLAGS.ID;
    }
    if(CPL <= IOPL) EFLAGS.IF = TemporaryEFLAGS.IF;
    if(CPL == 0) {
        EFLAGS.IOPL = TemporaryEFLAGS.IOPL;
        if(OperandSize == 32) {
            EFLAGS.VM = TemporaryEFLAGS.VM;
            EFLAGS.VIF = TemporaryEFLAGS.VIF;
            EFLAGS.VIP = TemporaryEFLAGS.VIP;
        }
    }
    //END
}

从上面的伪代码可以看出，如果在中断返回时，代码段从特权级较高的一层转换到了特权级较低的一层，iret 指令还会将 SS 和 ESP 出栈覆盖原来的值，也就是说在从内核态切换到用户态的时候我们还要设置 tf_ss 到用户段空间中，确保栈段选择子的 RPL 与代码段选择子的 RPL 相同，否则会引发 General Protection 异常。

struct trapframe switchk2u, *switchu2k;
static void
trap_dispatch(struct trapframe *tf) {
    char c;
    switch (tf->tf_trapno) {
    // ...
    case T_SWITCH_TOU:
        switchk2u = *tf;
        switchk2u.tf_cs = USER_CS;
        switchk2u.tf_ds = switchk2u.tf_es = switchk2u.tf_ss = USER_DS;
        switchk2u.tf_esp = (uintptr_t) &tf->tf_esp;
        switchk2u.tf_eflags |= FL_IOPL_3;
        // Will overwrite %esp
        *((uint32_t *)tf - 1) = (uint32_t) &switchk2u;
        break;
    // ...
}

从以上代码可以看出，从内核态切换到用户态时要先修改各段寄存器，然后将 tf_esp 设置为原来的栈顶，最后更改 tf_eflags 的特权级。

+|    栈底方向     | 高位地址
 |      ....       |
 |      ....       |
 |      ....       |
 |      ....       |
 |      ....       |
 |      ....       |  <-------- [tf]（tf 即原来的 %esp）
 |       tf        |  <-------- [esp]
 |      ....       |  低位地址

最后一行的

*((uint32_t *)tf - 1) = (uint32_t) &switchk2u;

比较巧妙，在这句执行完毕后，函数就返回到了

pushl %esp
call trap
popl %esp # <------ 返回到此处

而此时的栈变为了

+|    栈底方向     |  高位地址
 |      ....       |
 |      ....       |
 |      ....       |
 |      ....       |
 |      ....       |
 |      ....       |  <-------- [tf]（tf 即原来的 %esp）
 |   &switchk2u    |  <-------- [esp]
 |      ....       |  低位地址

所以，ESP 马上就会被 switchk2u 的地址所覆盖，接下来寄存器的出栈，和段寄存器的修改，都会以我们刚刚修改的 struct trapframe switchk2u 的内容为准，因此也就完成了从内核态到用户态的切换。在 gdb 中也可以观察到： Screenshot_20190326_104346

TIM截图20190326191809

而在用户态切换到内核态的时候，中断处理例程的 CS 段选择子权限高于发起中断的程序，所以会发生栈的切换，struct trapframe 中的 tf_esp 就是原来的 ESP 寄存器，而 SS 段选择子已经在发生中断的时候切换到内核的段选择子了，无需我们手动设定。同时，iret 返回时，tf_cs 的特权级也已经变成了内核特权级，没有发生特权级的转换，所以 SS 和 ESP 都不会被栈的内容覆盖，也无需设定了。

struct trapframe switchk2u, *switchu2k;
static void
trap_dispatch(struct trapframe *tf) {
    char c;
    switch (tf->tf_trapno) {
    // ...
    case T_SWITCH_TOK:
        if (tf->tf_cs == KERNEL_CS) return;
        tf->tf_cs = KERNEL_CS;
        tf->tf_ds = tf->tf_es = KERNEL_DS;
        tf->tf_eflags &= ~FL_IOPL_MASK;
        switchu2k = (struct trapframe *) (tf->tf_esp - (sizeof(struct trapframe) - 8));
        memmove(switchu2k, tf, sizeof(struct trapframe) - 8);
        *((uint32_t *)tf - 1) = (uint32_t) switchu2k;
        break;
    // ...
}

代码和切换到用户态类似，只不过少了 tf_ss 和 tf_esp 的拷贝。

完成代码后，执行 make grade ，可以看出我们的实现是正确的：

Screenshot_20190326_212545

从上面的分析可以看出，调用切换到用户态的函数时，会多 pop 两个 32 位数，所以在调用前要先保护栈，不过由于在中断处理程序中我们已经将栈指针改到别的地方去了，所以其实不太需要。

增加一个系统调用用来获取当前时钟计数值，则是通过 T_SYSCALL 中断来实现，系统调用采用 EAX 寄存器传递调用号，这里就取 1 作为获取时钟计数的调用号了：

// kern/init/init.c
static int get_ticks(void) {
    int ticks;

    asm volatile (
        "movl %2, %%eax \n"
        "int %1 \n"
        "movl %%eax, %0 \n"
        "movl %%ebp, %%esp \n"
        : "=r"(ticks)
        : "i"(T_SYSCALL), "i"(1)
        : "%eax"
    );

    return ticks;
}

// kern/trap/trap.c
struct trapframe switchk2u, *switchu2k;
static void
trap_dispatch(struct trapframe *tf) {
    char c;
    switch (tf->tf_trapno) {
    // ...
    case T_SYSCALL:
        if (tf->tf_cs == KERNEL_CS) { return; }
        if (tf->tf_regs.reg_eax == 1)
            tf->tf_regs.reg_eax = ticks;
        break;
    // ...
}

最后通过键盘上的“0”和“3”切换用户态和内核态的代码就是在主循环中加入：

while (1) {
    int c = getchar();
    cprintf("[kbd %c] ", c);
    switch (c) {
        case '0':
            cprintf("Switch to kernel mode\n");
            lab1_switch_to_kernel();
            lab1_print_cur_status();
            break;
        case '3':
            cprintf("Switch to user mode\n");
            lab1_switch_to_user();
            lab1_print_cur_status();
            break;
        case 't':
            cprintf("get_ticks(): %d\n", get_ticks());
            break;
    }
}

Screenshot_20190327_082634

按下 “3” 后，切换到用户态，按下 “0” 后，切换到内核态，按下“t”后就获得当前时钟计数值并输出。

值得注意的是，在内核态中调用切换到内核态的函数不会有任何作用；但如果在用户态调用切换到用户态模式，则会导致权限不足引发 General Proctection： Screenshot_20190327_084425 Screenshot_20190327_084409