Parallelism

CPython has an infamous GIL(Global Interpreter Lock) prevents developers getting true parallelism. With PyO3 you can release GIL when executing Rust code to achieve true parallelism.

The Python::allow_threads method temporarily releases the GIL, thus allowing other Python threads to run.

impl Python {
    pub fn allow_threads<T, F>(self, f: F) -> T where F: Send + FnOnce() -> T {}
}

Let's take a look at our word-count example, we have a wc_parallel function utilize the rayon crate to count words in parallel.

fn wc_parallel(lines: &str, search: &str) -> i32 {
    lines.par_lines()
         .map(|line| wc_line(line, search))
         .sum()
}

Then in the Python bridge, we have a function search exposed to Python runtime which calls wc_parallel inside Python::allow_threads method to enable true parallelism:

#[pymodule]
fn word_count(py: Python, m: &PyModule) -> PyResult<()> {

    #[pyfn(m, "search")]
    fn search(py: Python, path: String, search: String) -> PyResult<i32> {
        let mut file = File::open(path)?;
        let mut contents = String::new();
        file.read_to_string(&mut contents)?;

        let count = py.allow_threads(move || wc_parallel(&contents, &search));
        Ok(count)
    }

    Ok(())
}

Benchmark

Let's benchmark the word-count example to verify that we did unlock true parallelism with PyO3. We are using pytest-benchmark to benchmark three word count functions:

  1. Pure Python version
  2. Rust sequential version
  3. Rust parallel version

Benchmark script can be found here, then we can run pytest tests to benchmark them.

On MacBook Pro (Retina, 15-inch, Mid 2015) the benchmark gives:

Benchmark Result