Parallelism

CPython has the infamous GIL (Global Interpreter Lock), which prevents developers from getting true parallelism when running pure Python code. While PyO3 needs to hold the GIL by default when called from Python, in order to allow manipulation of Python objects, you can release the GIL when executing Rust-only code to achieve true parallelism.

The Python::allow_threads method temporarily releases the GIL, thus allowing other Python threads to run.

impl Python {
    pub fn allow_threads<T, F>(self, f: F) -> T where F: Send + FnOnce() -> T {}
}

Let's take a look at our word-count example, where we have a wc_parallel function that utilizes the rayon crate to count words in parallel.

fn wc_parallel(lines: &str, search: &str) -> i32 {
    lines.par_lines()
         .map(|line| wc_line(line, search))
         .sum()
}

Then in the Python bridge, we have a function search exposed to the Python runtime which calls wc_parallel inside a closure passed to Python::allow_threads to enable true parallelism:

#[pymodule]
fn word_count(py: Python, m: &PyModule) -> PyResult<()> {

    #[pyfn(m, "search")]
    fn search(py: Python, path: String, search: String) -> PyResult<i32> {
        let mut file = File::open(path)?;
        let mut contents = String::new();
        file.read_to_string(&mut contents)?;

        let count = py.allow_threads(move || wc_parallel(&contents, &search));
        Ok(count)
    }

    Ok(())
}

Benchmark

Let's benchmark the word-count example to verify that we did unlock true parallelism with PyO3. We are using pytest-benchmark to benchmark three word count functions:

The benchmark script can be found here, then we can run pytest tests to benchmark them.

On MacBook Pro (Retina, 15-inch, Mid 2015) the benchmark gives:

Benchmark Result

PyO3 user guide

Parallelism

Benchmark