< up >
2023-03-08

blogstage - static web server in rust

I was looking for a useful service-oriented Rust project where I can test my project standards¹ and dive deeper into the language. I decided to replace the “caddy file server” that serves this blog. The result is called blogstage: Simple web server providing my static blog to the world.

It can be used via blogstage <ip:port> <path> or just blogstage 0.0.0.0:80 ./blog/.

Requirements
How to serve
Testing
- Unit test
- Integration test
Benchmark
Wrap it up
Vision

Requirements

The requirements were fairly simple, because the blog is static consisting of a flat directory of rendered HTML files along with some images 2 and my reverse-proxy is handling the TLS stuff:

bind to given address+port
serve files from given directory (and prevent serving other directories e.g. via relative paths)
serve text and binary for HTML and images
not too easy to dive deeper into rust

How to serve

Since the requirements weren’t hard to fulfill, there are tons of crates that can do the job. E.g. warp can do it in one line with something like warp::serve(warp::fs::dir("./blog")).run(([127, 0, 0, 1], 8080)). I decided against a ready-to-use crate, to make it a bit more difficult and educational.

web_server didn’t worked with dynamic strings, since routes need to be added with &'static str where I couldn’t figure out if and how this can be fixed³.

The solution was surprisingly easy to find. There is a tutorial about Building a Multithreaded Web Server in the official rust docs. What a coincidence!

Argument parser

Just two mandatory positional arguments without a default to keep things simple and explicit:

/* parse arguments */
let uri = match std::env::args().nth(1) {
    Some(uri) => uri,
    None => {
        println!("usage: blogstage <URI> <PATH>");
        return;
    }
};

let path = match std::env::args().nth(2) {
    Some(path) => path,
    None => {
        println!("usage: blogstage <URI> <PATH>");
        return;
    }
};

File cache

The blog is pretty small, and its container 4 gets rebuilt for any change. Therefore I decided to load all the files non-recursively into a hash map with with basename as key and content as value.

/* load files */
let raw_entries = match fs::read_dir(path.clone()) {
    Ok(entries) => entries,
    Err(e) => {
        println!("error reading files from {}: {}", path, e);
        return;
    }
};

let mut files = HashMap::new();

for entry in raw_entries {
    if !entry.as_ref().unwrap().path().is_file() {
        continue;
    }

    files.insert(
        entry
            .as_ref()
            .unwrap()
            .file_name()
            .into_string()
            .unwrap()
            .clone(),
        fs::read(entry.as_ref().unwrap().path().clone()).unwrap(),
    );
}

TCP

Although doing TCP by myself felt a bit too low-level in the first place, it was quite pleasant with rust using only bind and an incoming iterator:

/* start server */
let listener = match TcpListener::bind(uri.clone()) {/* error handling */};

for stream in listener.incoming() {
    match stream {
        Ok(s) => on_request(stream, fileHashMap),
        Err(e) => /* error handling */
    }
}

HTTP

The request handler has a TCP stream such as the hash map containing all the files:

pub fn on_request(mut stream: TCPStream, files: HashMap<String, Vec<u8>>)

Then the request will get read until an empty line has been reached:

let reader = BufReader::new(&mut stream);
let request: Vec<_> = reader
    .lines()
    .map(|result| result.unwrap())
    .take_while(|line| !line.is_empty())
    .collect();

Parse the requested file quick ‘n’ dirty (see vision). Additionally index.html if the requested filename is empty for convenience:

let mut target: String = request[0].split(' ').collect::<Vec<&str>>()[1][1..].to_string();
if target.is_empty() {
  target = "index.html".into()
}

In order to serve more complex files like images (and to be future ready if I add openscad or mp3 stuff), guess the mime type via the crate…well…mime_guess:

let mime = mime_guess::from_path(target.clone()).first().unwrap();

If the requested file can be found in the hashmap, write its content along with a valid HTTP header to the stream. Otherwise just write a HTTP header with status code 404. On any error, the response handler panics caused by the massive use of unwrap, and the thread just vanishes into the void (see vision):

match files.get(&target) {
  Some(body) => {
    let length = body.len();

    println!("200 {}", target);
    stream.write_all(
      format!("HTTP/1.1 200 OK\r\nContent-Length: {length}\r\nContent-Type: {mime}\r\n\r\n")
      .as_bytes()
    ).unwrap();
    stream.write_all(body).unwrap();
  }
  None => {
    println!("404 {}", target);
    stream
      .write_all("HTTP/1.1 404 NOT FOUND\r\n\r\n".as_bytes())
      .unwrap();
  }
};

Parallel request handling

Since I want to implement parallel request handling as simple as possible and don’t need to limit the amount of requests 5, thread::spawn will do the trick:

for stream in listener.incoming() {
    match stream {
        Ok(s) => {
            let f = files.clone();
            thread::spawn(move || on_request(s, f));
        }
        Err(e) => {
            println!("error accepting connection: {}", e);
            continue;
        }
    }
}

The file map must be cloned, so its ownership can be moved to the thread.

Exit

It doesn’t deserve the predicate graceful, but it provides some simple mechanism to simply abort the whole request handling via the crate ctrlc:

ctrlc::set_handler(move || {
   std::process::exit(0);
})
.unwrap();

Testing

My testing goal was to cover the on_request method via unit- and the CLI via integration-tests.

Unit test

To test the on_request method, the stream must be mocked. Looking closely what the function actually does with the stream reveals that we actually only need to read from and write to it. As described in the tutorial, TCPStream can be replaced by allowing any type that implements the Read and Write trait using impl Read+Write:

pub fn on_request(mut stream: impl Read + Write, files: HashMap<String, Vec<u8>>)

The mock implements all methods needed for properly read and write to it. I modified the tutorial’s mock to be single-threaded as we don’t use async:

use assert_cmd::prelude::*;
use predicates::prelude::*;
use std::cmp::min;
use std::collections::HashMap;
use std::io;
use std::io::{Read, Write};
use std::process::Command;

struct MockTcpStream {
    read_data: Vec<u8>,
    write_data: Vec<u8>,
}

// https://doc.rust-lang.org/std/io/trait.Read.html
impl Read for MockTcpStream {
    fn read(self: &mut Self, buf: &mut [u8]) -> io::Result<usize> {
        let size: usize = min(self.read_data.len(), buf.len());
        buf[..size].copy_from_slice(&self.read_data[..size]);
        Ok(size)
    }
}

// https://doc.rust-lang.org/std/io/trait.Write.html
impl Write for MockTcpStream {
    fn write(self: &mut Self, buf: &[u8]) -> io::Result<usize> {
        self.write_data.extend(buf.iter().cloned());
        Ok(buf.len())
    }

    fn flush(self: &mut Self) -> io::Result<()> {
        Ok(())
    }
}

Then tests can simply be written as such:

#[test]
fn serve_not_found() {
    let input = b"GET /test.html HTTP/1.1\r\n\r\n";
    let mut contents = vec![0u8; 1024];

    contents[..input.len()].clone_from_slice(input);
    let mut stream = MockTcpStream {
        read_data: contents,
        write_data: Vec::new(),
    };

    let files = HashMap::new();
    blogstage::on_request(&mut stream, files);

    let expected_response = format!("HTTP/1.1 404 NOT FOUND\r\n\r\n");
    assert!(stream.write_data.starts_with(expected_response.as_bytes()));
}

Integration test

Each integration test case ensures that the binary behaves correctly with respect to the given environment and arguments. The following test checks if blogstage fails correctly when provided with no arguments:

#[test]
fn uri_is_missing() -> Result<(), Box<dyn std::error::Error>> {
    let mut cmd = Command::cargo_bin("blogstage")?;

    cmd.assert()
        .success()
        .stdout(predicate::str::contains("usage: blogstage <URI> <PATH>\n"));
    Ok(())
}

cases are based on the clip application test section from the official docs
execution and asserting is done via the crate assert_cmd
the binary still exits with success, since I haven’t added exit codes

Coverage report

Coverage reports help to measure and observe how much of the code is actually covered. Based on those reports, it is easy to tell for which branches tests need to be made.

The coverage profile can be enabled via environment variables, as described in the docs

RUSTFLAGS: "-C instrument-coverage" LLVM_PROFILE_FILE: "cargo-test-%p-%m.profraw" cargo test

⁶

The llvm-coverage-tools provide rust-profdata and rust-cov to process the generated raw coverage data. All those *.profraw files can be merged via rust-profdata merge -sparse *.profraw -o coverage.profdata and converted to html using rust-cov show target/debug/blogstage -instr-profile=coverage.profdata --ignore-filename-regex=/.cargo --format=html --show-line-counts-or-regions > coverage/index.html

Unfortunately I didn’t got the unittest covered, although they all passed. The integration tests on the other side worked out-of-the-box.

Benchmark

If the files get cached in-memory and the service is tightly tailored around my use-case, you may ask Is it faster?

Well, a tiny bit. I used mildsunrise’s curl-benchmark with 100 requests and an insecure connection⁷:

caddy file serve:

         DNS      TCP        SSL  Request           Content
Code  lookup  connect  handshake     sent    TTFB  download
min:     5.0     51.0       22.0     28.0    54.0       0.0
avg:     5.7     55.6       35.8     32.2    60.1       1.1
med:     5.0     55.0       36.0     32.0    60.0       1.0
max:    15.0     67.0       45.0     41.0    86.0      10.0
dev:   21.5%     5.5%      10.7%     8.1%    7.5%    106.1%

blogstage:

         DNS      TCP        SSL  Request           Content
Code  lookup  connect  handshake     sent    TTFB  download
min:     5.0     51.0       26.0     28.0    58.0       0.0
avg:     5.4     55.9       34.9     32.3    68.3       0.7
med:     5.0     55.0       35.0     32.0    68.0       1.0
max:     7.0     68.0       46.0     41.0    77.0       2.0
dev:   12.3%     4.9%      11.2%     6.9%    5.9%     95.5%

Wrap it up

src/main.rs:

use std::collections::HashMap;
use std::fs;
use std::net::TcpListener;
use std::thread;

use blogstage::on_request;

// https://doc.rust-lang.org/book/ch20-01-single-threaded.html
fn main() {
    /* parse arguments */
    let uri = match std::env::args().nth(1) {
        Some(uri) => uri,
        None => {
            println!("usage: blogstage <URI> <PATH>");
            return;
        }
    };

    let path = match std::env::args().nth(2) {
        Some(path) => path,
        None => {
            println!("usage: blogstage <URI> <PATH>");
            return;
        }
    };

    /* load files */
    let raw_entries = match fs::read_dir(path.clone()) {
        Ok(entries) => entries,
        Err(e) => {
            println!("error reading files from {}: {}", path, e);
            return;
        }
    };

    let mut files = HashMap::new();

    for entry in raw_entries {
        if !entry.as_ref().unwrap().path().is_file() {
            continue;
        }

        files.insert(
            entry
                .as_ref()
                .unwrap()
                .file_name()
                .into_string()
                .unwrap()
                .clone(),
            fs::read(entry.as_ref().unwrap().path().clone()).unwrap(),
        );
    }

    /* start server */
    let listener = match TcpListener::bind(uri.clone()) {
        Ok(l) => l,
        Err(e) => {
            println!("error on binding to {}: {}", uri, e);
            return;
        }
    };

    // react to Ctrl+C
    ctrlc::set_handler(move || {
        std::process::exit(0);
    })
    .unwrap();

    for stream in listener.incoming() {
        match stream {
            Ok(s) => {
                let f = files.clone();
                thread::spawn(move || on_request(s, f));
            }
            Err(e) => {
                println!("error accepting connection: {}", e);
                continue;
            }
        }
    }
}

src/lib.rs:

use std::collections::HashMap;
use std::io::{prelude::*, BufReader, Read, Write};

// We need to put everything out of main.rs what should be tested via integration tests:
// https://doc.rust-lang.org/book/ch11-03-test-organization.html#integration-tests-for-binary-crates

pub fn on_request(mut stream: impl Read + Write, files: HashMap<String, Vec<u8>>) {
    let reader = BufReader::new(&mut stream);
    let request: Vec<_> = reader
        .lines()
        .map(|result| result.unwrap())
        .take_while(|line| !line.is_empty())
        .collect();

    let mut target: String = request[0].split(' ').collect::<Vec<&str>>()[1][1..].to_string();

    if target.is_empty() {
        target = "index.html".into()
    }

    let mime = mime_guess::from_path(target.clone()).first().unwrap();

    match files.get(&target) {
        Some(body) => {
            let length = body.len();

            println!("200 {}", target);
            stream.write_all(
                format!("HTTP/1.1 200 OK\r\nContent-Length: {length}\r\nContent-Type: {mime}\r\n\r\n")
                .as_bytes()
                ).unwrap();
            stream.write_all(body).unwrap();
        }
        None => {
            println!("404 {}", target);
            stream
                .write_all("HTTP/1.1 404 NOT FOUND\r\n\r\n".as_bytes())
                .unwrap();
        }
    };
}

Vision

compression: add additional gzip compression
refactor request parser: despite its bad readability, it e.g. fails if a filename contains spaces
eliminate unwraps: explicit error handling instead of “resilience by accident”

like testing, coverage, release, ci,…

via blogctl
see 70a9c21e
For simplifying the operation effort, I build a container containing the whole blog. I replaced the caddy container with an alpine one.
in comparison to the ThreadPool approach of the Multithreaded Server tutorial
Those environment variables can also be used for cargo build to generate a coverage binary.
python curl-benchmark.py – -L -k evilcookie.de`, somehow the SSL handshake didn’t work…