Parallelism
CPython has an infamous GIL(Global Interpreter Lock) prevents developers getting true parallelism. With PyO3 you can release GIL when executing Rust code to achieve true parallelism.
The Python::allow_threads
method temporarily releases the GIL, thus allowing other Python threads to run.
impl Python {
pub fn allow_threads<T, F>(self, f: F) -> T where F: Send + FnOnce() -> T {}
}
Let's take a look at our word-count example,
we have a wc_parallel
function utilize the rayon crate to count words in parallel.
fn wc_parallel(lines: &str, search: &str) -> i32 {
lines.par_lines()
.map(|line| wc_line(line, search))
.sum()
}
Then in the Python bridge, we have a function search
exposed to Python runtime which calls wc_parallel
inside
Python::allow_threads
method to enable true parallelism:
#[pymodule]
fn word_count(py: Python, m: &PyModule) -> PyResult<()> {
#[pyfn(m, "search")]
fn search(py: Python, path: String, search: String) -> PyResult<i32> {
let mut file = File::open(path)?;
let mut contents = String::new();
file.read_to_string(&mut contents)?;
let count = py.allow_threads(move || wc_parallel(&contents, &search));
Ok(count)
}
Ok(())
}
Benchmark
Let's benchmark the word-count
example to verify that we did unlock true parallelism with PyO3.
We are using pytest-benchmark
to benchmark three word count functions:
Benchmark script can be found here,
then we can run pytest tests
to benchmark them.
On MacBook Pro (Retina, 15-inch, Mid 2015) the benchmark gives: