At the time of writing, my wife and I are in the process of converting a 2000 Blue Bird CSRE 3408 into a motor home. After spending a winter exploring the coast of Florida, we plan to road trip across the continental US, stopping at national parks along the way. Our road trip has constraints, such as:
- Limiting the total road trip length,
- Limiting the distance between stops,
- Ensuring the park has a campground that allows our 34’ motor home
For the final project of my Database Management Systems course, I chose to use Datalog to find potential road trips that fit all of these constraints. This post walks through this exploration. The final result is here.
Data
First, I had to gather park data. Fortunately, the National Park Service offers a free API from which I can get most of the necessary information. This is pretty quick, given common tools like curl and jq.
#!/usr/bin/env bash
set -eo pipefail
mkdir -p data
# get total number of campgrounds
total=$(curl \
-H "X-Api-Key: $NPS_API_KEY" \
-H "accept: application/json" \
-X GET "https://developer.nps.gov/api/v1/campgrounds?limit=0" |
jq -r '.total'
)
# fetch all those campgrounds
curl \
-H "X-Api-Key: $NPS_API_KEY" \
-H "accept: application/json" \
-X GET "https://developer.nps.gov/api/v1/campgrounds?limit=$total" |
jq -c '.data' > data/campgrounds.json
And we can do the same thing for their parks
endpoint.
Pro tip: if you’re like me and need to re-learn Bash syntax every time you use it, check out shellcheck; when combined with a filewatcher like watchexec, this can really speed up your Bash wpm. E.g.
$ watchexec --watch bin -- shellcheck bin/*
In bin/fetch_nps_data line 16:
-H 'X-Api-Key: $NPS_API_KEY' \
^-----------------------^ SC2016: Expressions don't expand in single quotes, use double quotes for that.
Next, a little more jq-fu to coerce some of this json into tab separated fact files for Datalog:
#!/usr/bin/env bash
set -eo pipefail
cd data
# parks
filter=$(cat <<-jqfilter
.[]
| [.parkCode, .fullName]
| @tsv
jqfilter
)
jq -r "$filter" parks.json > park.facts
# campgrounds
filter=$(cat <<-jqfilter
.[]
| select(.accessibility.rvAllowed == "1")
| [.id, .parkCode, .name]
| @tsv
jqfilter
)
jq -r "$filter" campgrounds.json > campground.facts
# campground locations
filter=$(cat <<-jqfilter
.[]
| select(
.accessibility.rvAllowed == "1"
and .latitude != ""
and .longitude != ""
)
| [.id, .latitude, .longitude]
| @tsv
jqfilter
)
jq -r "$filter" campgrounds.json > location.facts
# campground rv amenities
filter=$(cat <<-jqfilter
.[]
| select(.accessibility.rvAllowed == "1")
| [.id,
.accessibility.rvMaxLength,
.amenities.internetConnectivity,
.amenities.cellPhoneReception,
.amenities.dumpStation
]
| @tsv
jqfilter
)
jq -r "$filter" campgrounds.json > amenities.facts
Distances
Of course, to plan a realistic road trip it would be best to use actual directions from something like OpenStreetMap or Google Maps. But, since I have time limitations, I’m just using the haversine formula for now. Luckily, the rust ecosystem is plentiful, so this doesn’t take much code. First, the types:
use geo::prelude::HaversineDistance;
use geo::{point, Point};
use serde::Deserialize;
#[derive(Clone, Debug, Deserialize, PartialEq)]
struct LocationRow {
camp_id: String,
latitude: f64,
longitude: f64,
}
impl LocationRow {
fn coordinate(&self) -> Point<f64> {
point!(x: self.longitude, y: self.latitude)
}
/// Distance to another location in miles
pub fn distance_to(&self, other: &LocationRow) -> f64 {
const MILES_PER_METER: f64 = 0.000621371;
let meters = self
.coordinate()
.haversine_distance(&other.coordinate());
return meters * MILES_PER_METER;
}
}
The LocationRow
is exactly the type of tsv row we piped to location.facts
.
So, we’ll read each of those campgrounds’ locations, generate all the pairs, and
write the distances to another facts file:
use csv;
use itertools::Itertools;
/// Generate distance.facts from the NPS location.facts file
fn generate_distances() -> Result<()> {
let pairs = csv::ReaderBuilder::new()
.has_headers(false)
.delimiter(b'\t')
.from_path("data/location.facts")?
.into_deserialize::<LocationRow>()
.filter_map(|x| x.ok())
.combinations_with_replacement(2);
let mut writer = csv::WriterBuilder::new()
.has_headers(false)
.delimiter(b'\t')
.from_path("data/distance.facts")?;
for pair in pairs {
writer.write_record(&[
&pair[0].camp_id,
&pair[1].camp_id,
&format!("{:.2}", &pair[0].distance_to(&pair[1])),
])?;
}
Ok(())
}
Datalog
This road trip planner uses the souffle dialect of Datalog. First, I defined some types and let souffle know where to find the relation facts.
.type ParkId <: symbol
.type CampId <: symbol
.type Name <: symbol
.type Distance <: float
.decl camp(camp:CampId, park: ParkId, name:Name)
.input camp(filename="campground.facts")
.decl park(park:ParkId, name:Name)
.input park(filename="park.facts")
.decl distance(camp1:CampId, camp2:CampId, dist:Distance)
.input distance(filename="distance.facts")
.decl location(camp:CampId, lat:float, long: float)
.input location(filename="location.facts")
.decl amenities(camp:CampId, rv:number, internet:symbol, cell:symbol, dump:symbol)
.input amenities(filename="amenities.facts")
Next, I made some useful helper relations – restrict our attention to campgrounds that actually fit our RV, and have a commutative distance relation:
// Campgrounds that fit our RV
// Note: some campgrounds just report 0 but allow RVs
.decl rv_camp(id:CampId)
rv_camp(id) :- camp(id, _, _),
amenities(id, rv_len, _, _, _),
(rv_len = 0 ; rv_len >= 34).
// Distances between campgrounds that fit our RV
.decl rv_dist(camp1:CampId, camp2:CampId, dist:Distance)
rv_dist(from, to, len) :-
distance(from, to, len), rv_camp(from), rv_camp(to).
rv_dist(to, from, len) :-
distance(from, to, len), rv_camp(from), rv_camp(to).
Next I define all possible road trip segments. These exceed 100mi to make useful progress between stops, but are limited to 600mi so they can fit easily in one weekend. Also, I need to make sure that the direction of these segments is oriented in the general direction of the final destination. There’s a few ways to do this. We could restrict the segments in the general NW direction; that is, make sure that northward progress exceeds eastward, and western progress exceeds southward. (For example, it’s okay to backtrack east 100mi if we make 500mi progress north.)
// Optimized road trip segments
// 1. Limit 600mi between stops
// 2. Exceed 100mi between stops
// 3. Go northwest (generally) to make progress from FL to WA
.decl segment(camp1:CampId, camp2:CampId, dist:Distance)
segment(from, to, len) :-
rv_dist(from, to, len),
100 <= len, len <= 600,
location(from, f_lat, f_long),
location(to, t_lat, t_long),
(f_long - t_long > f_lat - t_lat),
(t_lat - f_lat > t_long - f_long).
Another option would be to just make sure that we’re closing in on the final destination. For example, let’s say we want to make it to an RV campground near Mount Rainier. Then
// End location near Mount Rainier
.decl camp_near_seattle(camp:CampId)
camp_near_seattle(c) :- camp(c, "mora", "Cougar Rock Campground").
// Optimized road trip segments
// 1. Limit 600mi between stops
// 2. Exceed 100mi between stops
// 3. Make progress towards final destination
.decl segment(camp1:CampId, camp2:CampId, dist:Distance)
segment(from, to, len) :-
rv_dist(from, to, len),
100 <= len, len <= 600,
rv_dist(from, end, dist_from),
rv_dist(to, end, dist_to),
dist_from > dist_to,
camp_near_seattle(end).
is also a reasonable segment
defintion, and is more portable for finding road
trips to other locations. In addition, having Mount Rainier be a sink of this
directed graph alleviates the need for a flimsy stopping condition in the road
trip planner, as souffle can just iterate until the distance to the final
destination goes to zero.
Finally, the question remains how to assemble these segments into road trip plans. There are a few options.
Naive Enumeration
This is perhaps the first obvious solution, essentially the transitive closure of this directed graph of segments.
.decl road_trip_segment(from:CampId, to:CampId, acc:Distance)
road_trip_segment(f, t, d) :-
camp(f, "ever", "Flamingo Campground"),
segment(f, t, d).
road_trip_segment(f, t, acc+d) :-
road_trip_segment(_, f, acc),
segment(f, t, d).
This doesn’t generate single road trip plans, but rather generates all campgrounds reachable along the segments defined above, starting from the Flamingo Campground in the Everglades, and of course stopping at the sink Mount Rainier (as long as it can indeed reach Mount Rainier with the segment constraints). However, with the definitions thus far, the order of magnitude of these combinations is infeasible to compute in souffle, at least on my laptop.
Functional Dependencies
After some digging in the documentation, I found that souffle recently added
support for functional
dependencies,
that is, we can express a dependency from -> to
and non-deterministically
choose a single to
campground for each from
. This is currently unreleased,
and requires building souffle from a recent commit. The code is the same as
above except an extra choice-domain from
clause:
// Generate a non-deterministic path
.decl road_trip_segment(from:CampId, to:CampId, acc:Distance)
choice-domain from
The plus side: this does result in a road trip from FL to WA that fits our constraints.
Stop | Park Campground | Distance |
---|---|---|
1 | Everglades National Park: Flamingo Campground | 0 |
2 | Gulf Islands National Seashore: Fort Pickens Campground Loop A | 527.19 |
3 | Ozark National Scenic Riverways: Alley Spring Campground | 1056.47 |
4 | Lake Meredith National Recreation Area: Blue Creek | 1632.46 |
5 | Glen Canyon National Recreation Area: Beehives Campground | 2185.89 |
6 | Great Basin National Park: Baker Creek Campground | 2391.80 |
7 | Whiskeytown National Recreation Area: Brandy Creek RV | 2847.99 |
8 | North Cascades National Park: Colonial Creek South Campground | 3410.27 |
9 | Mount Rainier National Park: Cougar Rock Campground | 3546.77 |
The downside is that this only results in a single plan; although the result is “non-deterministic”, it is not actually random. Repeatedly running the souffle planner always outputs the same plan. Thus, we cannot perform any optimizations for preferred amenities or total trip length. Furthermore, in some ways we just got lucky that this successfully resulted in a plan. For instance, suppose this chosen path goes directly northwest for the first 7 stops to Bighorn Canyon, but then encounters a northwestward dead end, with no other campgrounds within 600mi to the west or north; then there would be no more campgrounds to travel to that decrease the distance to the final destination, i.e. Bighorn is a leaf in the directed graph of segments. However, it might have been possible to avoid this dead end by going straight west a few stops earlier.
Min/Max Choice
Instead of having souffle choose the next campground non-deterministically, I could choose a specific one, for example to minimize the distance between stops.
.decl road_trip_segment(from:CampId, to:CampId, acc:Distance)
choice-domain from
road_trip_segment(f, t, d) :-
camp(f, "ever", "Flamingo Campground"), segment(f, t, d).
road_trip_segment(f, t, acc+d) :-
road_trip_segment(_, f, acc),
d = min l : segment(f, t, l).
This generates the following plan, with many more stops than before:
Stop | Park Campground | Distance |
---|---|---|
1 | Everglades National Park: Flamingo Campground | 0 |
2 | Gulf Islands National Seashore: Fort Pickens Campground Loop A | 527.19 |
3 | Natchez Trace Parkway: Rocky Springs Campground, Milepost 54.8. | 768.80 |
4 | Hot Springs National Park: Gulpha Gorge Campground | 981.03 |
5 | Buffalo National River: Woolum | 1081.46 |
6 | Lake Meredith National Recreation Area: Cedar Canyon Campground | 1549.38 |
7 | Bandelier National Monument: Juniper Family Campground | 1831.56 |
8 | Mesa Verde National Park: Morefield Campground | 1989.07 |
9 | Curecanti National Recreation Area: Ponderosa Campground | 2092.99 |
10 | Arches National Park: Devils Garden Campground | 2216.85 |
11 | Dinosaur National Monument: Split Mountain Group Campground | 2333.38 |
12 | Grand Teton National Park: Gros Ventre Campground | 2564.26 |
13 | Craters Of The Moon National Monument & Preserve: Lava Flow Campground | 2709.51 |
14 | Glacier National Park: Two Medicine | 3057.03 |
15 | Lake Roosevelt National Recreation Area: North Gorge Campground | 3269.73 |
16 | North Cascades National Park: Colonial Creek South Campground | 3410.73 |
17 | Olympic National Park: Heart O’ the Hills Campground | 3527.01 |
18 | Mount Rainier National Park: Cougar Rock Campground | 3643.42 |
However, this suffers from the same drawbacks. To demonstrate, an attempt to choose the max distance between stops results in a dead end half-way through the trip:
.decl road_trip_segment(from:CampId, to:CampId, acc:Distance)
choice-domain from
road_trip_segment(f, t, d) :-
camp(f, "ever", "Flamingo Campground"), segment(f, t, d).
road_trip_segment(f, t, acc+d) :-
road_trip_segment(_, f, acc),
d = max l : segment(f, t, l).
Stop | Park Campground | Distance |
---|---|---|
1 | Everglades National Park: Flamingo Campground | 0 |
2 | Gulf Islands National Seashore: Fort Pickens Campground Loop A | 527.19 |
3 | Ozark National Scenic Riverways: Pulltite Campground | 1068.55 |
4 | Sleeping Bear Dunes National Lakeshore: D. H. Day Campground | 1662.30 |
5 | Pictured Rocks National Lakeshore: Hurricane River Campground | 1784.65 |
Recursive Types
The final part of this exploration leverages the souffle type system, which
allows for arbitrary records, algebraic data types, and recursion. For
convenience, I’ll define a Segment
representing an ordered pair of campgrounds
.type Segment = [from:CampId, to:CampId]
and a cons list of them called Path
.type Path = [tail:Path, head:Segment]
To demonstrate the utility here, let’s look at the naive enumeration utilizing these types:
.decl road_trip(x: Path)
road_trip([nil, [f, t]]) :-
camp(f, "ever", "Flamingo Campground"),
segment(f, t, _).
road_trip([tail, [f,t]]) :-
road_trip(tail),
tail=[tail2, [start, f]],
segment(f, t, _).
Of course, what I mentioned above is still true, as is; this generates way too many
segments to compute in a timely manner. But, notice the output is much
cleaner. Instead of a relation filled with arbitrary segments that would be
annoying to assemble into possible road trips, we actually have a relation that
is filled with individual road trips of type Path
. Some of them might not make it
all the way to WA, as was shown in the max choice example above, but we could
filter those out afterwards, e.g.
.decl successful_road_trip(x: Path)
.output successful_road_trip(IO=stdout)
successful_road_trip(path) :-
road_trip(path),
path=[tail, [f, end]],
camp_near_seattle(end).
As a demonstration, the number of plans generated to go from Yellowstone to the Badlands totals 29,376, with the first plans being more direct, such as:
Plan 1
Stop | Park Campground | Distance |
---|---|---|
1 | Yellowstone National Park: Bridge Bay Campground | 0 |
2 | Badlands National Park: Cedar Pass Campground | 424.14 |
and proceedingly more winding:
Plan 100
Stop | Park Campground | Distance |
---|---|---|
1 | Yellowstone National Park: Mammoth Campground | 0 |
2 | Bighorn Canyon National Recreation Area: Horseshoe Bend Campgroun | 118.95 |
3 | Badlands National Park: Cedar Pass Campground | 441.72 |
Plan 1000
Stop | Park Campground | Distance |
---|---|---|
1 | Yellowstone National Park: Bridge Bay Campground | 0 |
2 | Dinosaur National Monument: Gates of Lodore Campground | 274.63 |
3 | Bighorn Canyon National Recreation Area: Barry’s Landing & Trail Creek Campground | 578.70 |
4 | Theodore Roosevelt National Park: Juniper Campground | 868.24 |
5 | Badlands National Park: Cedar Pass Campground | 1142.63 |
Plan 10000
Stop | Park Campground | Distance |
---|---|---|
1 | Yellowstone National Park: Indian Creek Campground | 0 |
2 | Dinosaur National Monument: Split Mountain Group Campground | 316.09 |
3 | Yellowstone National Park: Lewis Lake Campground | 590.41 |
4 | Dinosaur National Monument: Gates of Lodore Campground | 851.58 |
5 | Yellowstone National Park: Pebble Creek Campground | 1147.70 |
5 | Theodore Roosevelt National Park: Roundup Group Horse Camp | 1496.24 |
5 | Badlands National Park: Cedar Pass Campground | 1733.97 |
and so on. Clearly the 10,000th plan above is absurd, and the reason is not
just because I’m using haversine distances. The reason did not become clear to
me until I looked at these parks on a map. The main problem is the progress
condition just cares about any nonzero progress towards the destination; if
you imagine a circle spiraling in towards the final destination at the center,
you can see how this condition might result in unwanted road trip plans. That
is, the restriction I defined earlier as 100 <= segment_length <= 600
is
misleading, as it doesn’t actually guarantee 100mi of progress.
And so this motivates the last improvement: rather than ensuring we drive 100mi between stops, instead ensure that the next stop is at least 100mi closer to the final destination.
// Optimized road trip segments
// 1. Limit 600mi between stops
// 2. Make 100mi progress towards final destination
.decl segment(camp1:CampId, camp2:CampId, dist:Distance)
segment(from, to, len) :-
rv_dist(from, to, len),
len <= 600,
rv_dist(from, end, dist_from),
rv_dist(to, end, dist_to),
dist_from - dist_to > 100,
camp_near_seattle(end).
Huzzah! Now, the enumeration only outputs 17 plans, with at most one intermediary stop. This is much more reasonable for two parks that are only 500mi from one another, and the only reason there are even as many as 17 plans is simply because each park has multiple RV campgrounds.
Furthermore, this last improvement allows me to accomplish what I set out to do: changing the minimum progress to 400mi allows Souffle to output 3,032 road trip plans from the Everglades to Mount Rainier in under a second.
CLI tool
I’ve parameterized the start and end locations into a small CLI tool to run this road trip planner:
❯ road-trip-planner --help
road-trip-planner 0.0.1
Sam Tay, samctay@pm.me
Generates road trip plans via national parks
USAGE:
road-trip-planner [FLAGS] <from> <to>
ARGS:
<from> Starting park code (e.g. ever)
<to> Ending park code (e.g. olym)
FLAGS:
-h, --help Prints help information
-l, --lucky Output a single trip
--min Use minimum distance between stops (for --lucky)
-r, --refresh Use fresh NPS data
-V, --version Prints version information
Conclusion
This was a fun exercise, but the resulting planner isn’t as useful as it could
be. The main missing ingredients are ordering and choice. While souffle supports
a single arbitrary choice from a given domain, what I’d really like to be able
to do is specify preferences, i.e. first choose from campgrounds with wifi
service and dump stations, and if there are none, then settle for cell service,
etc.. Something like Postgres’ coalesce
function would make this planner much
more useful.
Future improvements I have in mind for this planner are the following:
- Include the cell, internet, and dump amenities in the enumeration output. These can then be externally counted, so that plans can be sorted by stops with the most cell service, or to filter plans that include at least one dump stop every 500mi, etc..
- A few more things should be parameterized from the CLI, such as the RV length, minimum progress and maximum segment distance. The minimum progress turns out to be a very important parameter, and makes the difference of whether or not the computation finishes in a second or days.
- Call out to an API for actual driving distance and/or time, instead of using the haversine approximation.