In the first part of the series we’ve seen some first examples of error handling in Rust, using unwrapping or pattern matching. This post will try to show you how Rust is making error handling a little more convenient than this.
Let’s start with a simple program that reads this CSV file:
name,age,civil_status,married_when,maried_where,license_plate
Alice,17,Single,,,
Bob,19,Single,,,
Charles,43,Married,2005-05-01,New York,NY-102
Dina,6,Single,,,
Eva,40,Married,2005-05-01,New York,NY-655
Faith,89,Widowed,,,
and calculates the average age of persons:
import pandas as pd
def calc_avg_age():
df = pd.read_csv("persons.csv")
return df["age"].mean()
print("Average age is %f" % calc_avg_age())
Oops, this is written in my primary language Python, but I’ll let it stay for reference. Note how the error handling is implemented: no effort required from the developer, and it will panic if the file is not found, not readable, has wrong format, has no column age, or in any other case.
I would argue that this is exactly what I want when I develop new code. I want to see it works first, as soon as possible, because remember, I am a troubled developer and I want to have my first small success ASAP to soothe my fears. In Python, I usually first make it work, then I make it right (this includes error handling and documentation) and then optionally I will make it fast.
Here is the code is Rust:
use csv::Reader;
fn get_avg_age() -> f64 {
// open CSV reader
let mut reader = Reader::from_path("persons.csv").unwrap();
// find index of the age column
let headers = reader.headers().unwrap();
let index_of_age = headers
.iter()
.enumerate()
.find(|&x| x.1 == "age")
.unwrap()
.0;
// extract column age and convert it to f64
let records = reader.records();
let mut count_persons = 0;
let person_ages = records.map(|x| {
count_persons += 1;
x.unwrap()
.get(index_of_age)
.unwrap()
.parse::<f64>()
.unwrap()
});
// calculate average person age
person_ages.sum::<f64>() / (count_persons as f64)
}
fn main() {
println!("Average age {}", get_avg_age());
}
The Rust code has the same error handling behavior: it will panic, if something (anything!) will go wrong. Let’s imagine, for a sake of argument, that we want to move the get_avg_age function into a reusable library, and so, we want to add proper error handling to it.
In Python, I would probably not even change anything. The default Exceptions are good enough for production, they bear all necessary information, and they should be caught in the app (as a rule, we almost never should catch exceptions in the library, because it is up to the app to decide whether and how they will handled or logged).
In Rust, if I want to give the app the chance to decide whether and how errors will be handled or logged, I may not use unwrap anymore and have to forward all the errors to the caller. To do that, I need to change the return type of the function to a Result:
fn get_avg_age() -> Result<f64, ???>
The first generic parameter of the result is the type of the return value if there were no errors, so it is f64 like in the previous code. The second parameter is the type of the error. What should we write there? The function can cause different types of errors: on the file level, from the csv parser, when parsing strings to floats etc.
In other languages, all errors inherit from the same parent class Exception or Error, so we could then surely just use this base class in Rust too? Well, yes and no. It is possible in Rust, but Rust definitely raises its eyebrow every time you do it. More on this later, but let’s just first explore another alternative: we can create our own error type, and just return this one error every time some other error happens inside of our function.
If we follow this way and use pattern matching for the proper error handling, this would be our first version we can end up with:
use csv::Reader;
struct PersonLibError;
fn get_avg_age() -> Result<f64, PersonLibError> {
// open CSV reader
let mut reader = match Reader::from_path("persons.csv") {
Ok(x) => x,
Err(e) => return Err(PersonLibError {}),
};
// find index of the age column
let headers = match reader.headers() {
Ok(x) => x,
Err(e) => return Err(PersonLibError {}),
};
let index_of_age = match headers.iter().enumerate().find(|&x| x.1 == "age") {
Some(x) => x.0,
None => return Err(PersonLibError {}),
};
// extract column age, convert it to f64, and calculate sum
let mut count_persons = 0;
let mut sum_ages: f64 = 0.0;
for x in reader.records() {
count_persons += 1;
match x {
Ok(record) => match record.get(index_of_age) {
Some(age) => match age.parse::<f64>() {
Ok(age_num) => sum_ages += age_num,
Err(e) => return Err(PersonLibError {}),
},
None => return Err(PersonLibError {}),
},
Err(e) => return Err(PersonLibError {}),
};
}
// calculate average person age
Ok(sum_ages / (count_persons as f64))
}
fn main() {
match get_avg_age() {
Ok(r) => println!("Average age {}", r),
Err(e) => println!("Error"),
}
}
This version of code will never panic, which is good if we want to use it as library later. As you can see, the important information about the error (its type and context) will be lost, so this kind of error handling is not only very verbose, but also highly unprofessional and it makes the software hard to maintain.
Fortunately, Rust has supercharged enums, so that we can improve our error handling like this:
use csv::Reader;
enum PersonLibError {
FileError,
CsvParserError,
NoColumnNamedAge,
RecordHasNoValueInColumnAge,
CannotParseAge(std::num::ParseFloatError),
}
fn get_avg_age() -> Result<f64, PersonLibError> {
// open CSV reader
let mut reader = match Reader::from_path("persons.csv") {
Ok(x) => x,
Err(e) => return Err(PersonLibError::FileError),
};
// find index of the age column
let headers = match reader.headers() {
Ok(x) => x,
Err(e) => return Err(PersonLibError::CsvParserError),
};
let index_of_age = match headers.iter().enumerate().find(|&x| x.1 == "age") {
Some(x) => x.0,
None => return Err(PersonLibError::NoColumnNamedAge),
};
// extract column age and convert it to f64
let records = reader.records();
let mut count_persons = 0;
let mut sum_ages: f64 = 0.0;
for x in records {
count_persons += 1;
match x {
Ok(record) => match record.get(index_of_age) {
Some(age) => match age.parse::<f64>() {
Ok(age_num) => sum_ages += age_num,
Err(e) => return Err(PersonLibError::CannotParseAge(e)),
},
None => return Err(PersonLibError::RecordHasNoValueInColumnAge),
},
Err(e) => return Err(PersonLibError::CsvParserError),
};
}
// calculate average person age
Ok(sum_ages / (count_persons as f64))
}
Now in the caller we can match on different values of our enum and handle different errors accordingly. Note also how we have passed the original ParseFloatError into the PersonLibError::CannotParseAge value.
We can also derive our PersonLibError from Debug and can easily print out errors:
#[derive(Debug)]
enum PersonLibError {
FileError,
CsvParserError,
NoColumnNamedAge,
RecordHasNoValueInColumnAge,
CannotParseAge(std::num::ParseFloatError),
}
...
fn main() {
match get_avg_age() {
Ok(r) => println!("Average age {}", r),
Err(e) => println!("Error {:?}", e),
}
}
Our code has now better error handling, but still is pretty verbose. Let’s finally explore the “?” operator. It tries to unwrap its argument. If it is an Ok, it will just pass on the unwrapped value. If it is an Err, it will try to convert the error at hand to the error type defined in the Result of the containing function.
So, in our case, all errors inside of our function will be converted to PersonLibError
. So basically it will do automatically this part:
Err(e) => return Err(PersonLibError::<some fitting matching value>),
Rust will know how to convert error types if we implement the Into trait
on our error. There is a nice helpful crate called thiserror
that will do it for us automatically. Here is the new source code:
use csv::Reader;
#[derive(Debug, thiserror::Error)]
enum PersonLibError {
#[error("Csv parser error")]
CsvParserError(#[from] csv::Error),
#[error("No column named age")]
NoColumnNamedAge,
#[error("Record has no value in column age")]
RecordHasNoValueInColumnAge,
#[error("Cannot parse age")]
CannotParseAge(#[from] std::num::ParseFloatError),
}
fn get_avg_age() -> Result<f64, PersonLibError> {
// open CSV reader
let mut reader = Reader::from_path("persons.csv")?;
// find index of the age column
let headers = reader.headers()?;
let index_of_age = match headers.iter().enumerate().find(|&x| x.1 == "age") {
Some(x) => x.0,
None => return Err(PersonLibError::NoColumnNamedAge),
};
// extract column age and convert it to f64
let records = reader.records();
let mut count_persons = 0;
let mut sum_ages: f64 = 0.0;
for x in records {
count_persons += 1;
match x?.get(index_of_age) {
Some(age) => {
sum_ages += age.parse::<f64>()?;
}
None => return Err(PersonLibError::RecordHasNoValueInColumnAge),
};
}
// calculate average person age
Ok(sum_ages / (count_persons as f64))
}
There is a lot going on here:
- we could get rid of three match operators and replace it with much more concise “?” operator.
- unfortunately, we have also lost the possibility to tell
FileError
and CsvParserError
apart, because the underlying csv crate returns csv::Error
in both cases. To keep the difference, we had to return to pattern matching.
- ? doesn’t help us with Options, so we had to keep fully written pattern matching there
- thiserror has forced us to give each of error value a human-readable description. It is nice then to read it in the stack trace.
- the
#[from]
attribute tells thiserror
to implement the Into trait
for the corresponding error type.
All in all, by using “?” we have reduced boilerplate a little, have weakened our error handling by removing FileError, and last but not least, introduced a lot of things happening implicitely and therefore making it for novices harder to understand.
Another way to handle errors would be using some base error class and returning the underlying errors directly without wrapping them into our PersonLibError
. All Errors implement trait Error
, so we can just write
Result<f64, Error>
can’t we? No, we can’t. Because Error
can be implemented by different structs and different structs can have different size, and Rust wants to check the sizes of all values, and cannot do it here, it raises its eyebrow and forces you to write it like this:
Result<f64, Box<dyn Error>>
As far as I can tell, this Box<dyn>
stuff has no purpose other than to softly discourage you from using the memory allocations that cannot be statically checked by Rust. Rust could as well hide the difference between boxed and unboxed values from you as software developer, but it prefers to make it very explicit and if at some point the developer will get memory corruption or performance issues, Rust could then throw its hands in the sky and say “here, you are using Box and you are using dyn. No wonder you have issues now”.
Please correct me if I am wrong.
Using the Box<dyn>
syntax, we can re-implement the error handling like this:
use csv::Reader;
#[derive(Debug, thiserror::Error)]
enum PersonLibError {
#[error("No column named age")]
NoColumnNamedAge,
#[error("Record has no value in column age")]
RecordHasNoValueInColumnAge,
}
fn get_avg_age() -> Result<f64, Box<dyn std::error::Error>> {
// open CSV reader
let mut reader = Reader::from_path("persons.csv")?;
// find index of the age column
let headers = reader.headers()?;
let index_of_age = match headers.iter().enumerate().find(|&x| x.1 == "age") {
Some(x) => x.0,
None => return Err(Box::new(PersonLibError::NoColumnNamedAge)),
};
// extract column age and convert it to f64
let records = reader.records();
let mut count_persons = 0;
let mut sum_ages: f64 = 0.0;
for x in records {
count_persons += 1;
match x?.get(index_of_age) {
Some(age) => {
sum_ages += age.parse::<f64>()?;
}
None => return Err(Box::new(PersonLibError::RecordHasNoValueInColumnAge)),
};
}
// calculate average person age
Ok(sum_ages / (count_persons as f64))
}
fn main() {
match get_avg_age() {
Ok(r) => println!("Average age {}", r),
Err(e) => println!("Error {:?}", e),
}
}
Note that we could get rid of some values in our PersonLibError
, but still need it to handle None options cases. The rest of the code remained mostly unchanged.
Next, we could use Option::ok_or
to further reduce some boilerplate (note that you can also use ok_or if you don’t use Box<dyn>
):
fn get_avg_age() -> Result<f64, Box<dyn std::error::Error>> {
// open CSV reader
let mut reader = Reader::from_path("persons.csv")?;
// find index of the age column
let headers = reader.headers()?;
let index_of_age = headers
.iter()
.enumerate()
.find(|&x| x.1 == "age")
.ok_or(Box::new(PersonLibError::NoColumnNamedAge))?
.0;
// extract column age and convert it to f64
let mut count_persons = 0;
let mut sum_ages: f64 = 0.0;
for x in reader.records() {
count_persons += 1;
sum_ages += x?
.get(index_of_age)
.ok_or(Box::new(PersonLibError::RecordHasNoValueInColumnAge))?
.parse::<f64>()?;
}
// calculate average person age
Ok(sum_ages / (count_persons as f64))
}
And our final step would be restoring the full range of possible errors using map_err.
At the same time we can ditch the Box<dyn>
approach:
use csv::Reader;
#[derive(Debug, thiserror::Error)]
enum PersonLibError {
#[error("File error")]
FileError(#[source] csv::Error),
#[error("CsvParserError")]
CsvParserError(#[from] csv::Error),
#[error("No column named age")]
NoColumnNamedAge,
#[error("Record has no value in column age")]
RecordHasNoValueInColumnAge,
#[error["Cannot parse age value"]]
CannotParseAge(#[from] std::num::ParseFloatError),
}
fn get_avg_age() -> Result<f64, PersonLibError> {
// open CSV reader
let mut reader = Reader::from_path("persons.csv")?;
// find index of the age column
let headers = reader.headers().map_err(PersonLibError::FileError)?;
let index_of_age = headers
.iter()
.enumerate()
.find(|&x| x.1 == "age")
.ok_or(PersonLibError::NoColumnNamedAge)?
.0;
// extract column age and convert it to f64
let mut count_persons = 0;
let mut sum_ages: f64 = 0.0;
for x in reader.records() {
count_persons += 1;
sum_ages += x?
.get(index_of_age)
.ok_or(PersonLibError::RecordHasNoValueInColumnAge)?
.parse::<f64>()?;
}
// calculate average person age
Ok(sum_ages / (count_persons as f64))
}
fn main() {
match get_avg_age() {
Ok(r) => println!("Average age {}", r),
Err(e) => println!("Error {:?}", e),
}
}
Would you agree that with all of these “?” at the end of the line this code resembles Perl or APL? On the other hand, Rust consistently takes top places in run-time performance benchmarks and an elaborate error handling might be one of the costs we have (or gladly want) to pay to achieve that.
Summary
The biggest challenge for me is currently this: just by the looking at the function name you never know whether it returns a value, a Result or an Option.
If you know the return type of a Rust function:
- if it returns value, you don’t need to do anything
- if it returns Result, you add a “?” and check if the returned Err type is compatible with the type of the parent function. If it is not, you make it compatible by adding
map_err
in front of “?”
- If it returns Option, you add
ok_or
in front of “?”
P.S.
Both Python and Rust versions still have one (and the same) unhandled error that is extremely probable in production and would cause the process to exit with panic. Can you find it?
P.P.S
It takes Rust 62 ms to execute the main() function 1000 times (in –release configuration). Python needs 4 seconds for that. Q.E.D.