At the time of writing, my wife and I are in the process of converting a 2000 Blue Bird CSRE 3408 into a motor home. After spending a winter exploring the coast of Florida, we plan to road trip across the continental US, stopping at national parks along the way. Our road trip has constraints, such as:
- Limiting the total road trip length,
- Limiting the distance between stops,
- Ensuring the park has a campground that allows our 34’ motor home
For the final project of my Database Management Systems course, I chose to use Datalog to find potential road trips that fit all of these constraints. This post walks through this exploration. The final result is here.
Data
First, I had to gather park data. Fortunately, the National Park Service offers a free API from which I can get most of the necessary information. This is pretty quick, given common tools like curl and jq.
And we can do the same thing for their parks
endpoint.
Pro tip: if you’re like me and need to re-learn Bash syntax every time you use it, check out shellcheck; when combined with a filewatcher like watchexec, this can really speed up your Bash wpm. E.g.
Next, a little more jq-fu to coerce some of this json into tab separated fact files for Datalog:
Distances
Of course, to plan a realistic road trip it would be best to use actual directions from something like OpenStreetMap or Google Maps. But, since I have time limitations, I’m just using the haversine formula for now. Luckily, the rust ecosystem is plentiful, so this doesn’t take much code. First, the types:
The LocationRow
is exactly the type of tsv row we piped to location.facts
.
So, we’ll read each of those campgrounds’ locations, generate all the pairs, and
write the distances to another facts file:
Datalog
This road trip planner uses the souffle dialect of Datalog. First, I defined some types and let souffle know where to find the relation facts.
Next, I made some useful helper relations – restrict our attention to campgrounds that actually fit our RV, and have a commutative distance relation:
Next I define all possible road trip segments. These exceed 100mi to make useful progress between stops, but are limited to 600mi so they can fit easily in one weekend. Also, I need to make sure that the direction of these segments is oriented in the general direction of the final destination. There’s a few ways to do this. We could restrict the segments in the general NW direction; that is, make sure that northward progress exceeds eastward, and western progress exceeds southward. (For example, it’s okay to backtrack east 100mi if we make 500mi progress north.)
Another option would be to just make sure that we’re closing in on the final destination. For example, let’s say we want to make it to an RV campground near Mount Rainier. Then
is also a reasonable segment
defintion, and is more portable for finding road
trips to other locations. In addition, having Mount Rainier be a sink of this
directed graph alleviates the need for a flimsy stopping condition in the road
trip planner, as souffle can just iterate until the distance to the final
destination goes to zero.
Finally, the question remains how to assemble these segments into road trip plans. There are a few options.
Naive Enumeration
This is perhaps the first obvious solution, essentially the transitive closure of this directed graph of segments.
This doesn’t generate single road trip plans, but rather generates all campgrounds reachable along the segments defined above, starting from the Flamingo Campground in the Everglades, and of course stopping at the sink Mount Rainier (as long as it can indeed reach Mount Rainier with the segment constraints). However, with the definitions thus far, the order of magnitude of these combinations is infeasible to compute in souffle, at least on my laptop.
Functional Dependencies
After some digging in the documentation, I found that souffle recently added
support for functional
dependencies,
that is, we can express a dependency from -> to
and non-deterministically
choose a single to
campground for each from
. This is currently unreleased,
and requires building souffle from a recent commit. The code is the same as
above except an extra choice-domain from
clause:
The plus side: this does result in a road trip from FL to WA that fits our constraints.
Stop | Park Campground | Distance |
---|---|---|
1 | Everglades National Park: Flamingo Campground | 0 |
2 | Gulf Islands National Seashore: Fort Pickens Campground Loop A | 527.19 |
3 | Ozark National Scenic Riverways: Alley Spring Campground | 1056.47 |
4 | Lake Meredith National Recreation Area: Blue Creek | 1632.46 |
5 | Glen Canyon National Recreation Area: Beehives Campground | 2185.89 |
6 | Great Basin National Park: Baker Creek Campground | 2391.80 |
7 | Whiskeytown National Recreation Area: Brandy Creek RV | 2847.99 |
8 | North Cascades National Park: Colonial Creek South Campground | 3410.27 |
9 | Mount Rainier National Park: Cougar Rock Campground | 3546.77 |
The downside is that this only results in a single plan; although the result is “non-deterministic”, it is not actually random. Repeatedly running the souffle planner always outputs the same plan. Thus, we cannot perform any optimizations for preferred amenities or total trip length. Furthermore, in some ways we just got lucky that this successfully resulted in a plan. For instance, suppose this chosen path goes directly northwest for the first 7 stops to Bighorn Canyon, but then encounters a northwestward dead end, with no other campgrounds within 600mi to the west or north; then there would be no more campgrounds to travel to that decrease the distance to the final destination, i.e. Bighorn is a leaf in the directed graph of segments. However, it might have been possible to avoid this dead end by going straight west a few stops earlier.
Min/Max Choice
Instead of having souffle choose the next campground non-deterministically, I could choose a specific one, for example to minimize the distance between stops.
This generates the following plan, with many more stops than before:
Stop | Park Campground | Distance |
---|---|---|
1 | Everglades National Park: Flamingo Campground | 0 |
2 | Gulf Islands National Seashore: Fort Pickens Campground Loop A | 527.19 |
3 | Natchez Trace Parkway: Rocky Springs Campground, Milepost 54.8. | 768.80 |
4 | Hot Springs National Park: Gulpha Gorge Campground | 981.03 |
5 | Buffalo National River: Woolum | 1081.46 |
6 | Lake Meredith National Recreation Area: Cedar Canyon Campground | 1549.38 |
7 | Bandelier National Monument: Juniper Family Campground | 1831.56 |
8 | Mesa Verde National Park: Morefield Campground | 1989.07 |
9 | Curecanti National Recreation Area: Ponderosa Campground | 2092.99 |
10 | Arches National Park: Devils Garden Campground | 2216.85 |
11 | Dinosaur National Monument: Split Mountain Group Campground | 2333.38 |
12 | Grand Teton National Park: Gros Ventre Campground | 2564.26 |
13 | Craters Of The Moon National Monument & Preserve: Lava Flow Campground | 2709.51 |
14 | Glacier National Park: Two Medicine | 3057.03 |
15 | Lake Roosevelt National Recreation Area: North Gorge Campground | 3269.73 |
16 | North Cascades National Park: Colonial Creek South Campground | 3410.73 |
17 | Olympic National Park: Heart O’ the Hills Campground | 3527.01 |
18 | Mount Rainier National Park: Cougar Rock Campground | 3643.42 |
However, this suffers from the same drawbacks. To demonstrate, an attempt to choose the max distance between stops results in a dead end half-way through the trip:
Stop | Park Campground | Distance |
---|---|---|
1 | Everglades National Park: Flamingo Campground | 0 |
2 | Gulf Islands National Seashore: Fort Pickens Campground Loop A | 527.19 |
3 | Ozark National Scenic Riverways: Pulltite Campground | 1068.55 |
4 | Sleeping Bear Dunes National Lakeshore: D. H. Day Campground | 1662.30 |
5 | Pictured Rocks National Lakeshore: Hurricane River Campground | 1784.65 |
Recursive Types
The final part of this exploration leverages the souffle type system, which
allows for arbitrary records, algebraic data types, and recursion. For
convenience, I’ll define a Segment
representing an ordered pair of campgrounds
and a cons list of them called Path
To demonstrate the utility here, let’s look at the naive enumeration utilizing these types:
Of course, what I mentioned above is still true, as is; this generates way too many
segments to compute in a timely manner. But, notice the output is much
cleaner. Instead of a relation filled with arbitrary segments that would be
annoying to assemble into possible road trips, we actually have a relation that
is filled with individual road trips of type Path
. Some of them might not make it
all the way to WA, as was shown in the max choice example above, but we could
filter those out afterwards, e.g.
As a demonstration, the number of plans generated to go from Yellowstone to the Badlands totals 29,376, with the first plans being more direct, such as:
Plan 1
Stop | Park Campground | Distance |
---|---|---|
1 | Yellowstone National Park: Bridge Bay Campground | 0 |
2 | Badlands National Park: Cedar Pass Campground | 424.14 |
and proceedingly more winding:
Plan 100
Stop | Park Campground | Distance |
---|---|---|
1 | Yellowstone National Park: Mammoth Campground | 0 |
2 | Bighorn Canyon National Recreation Area: Horseshoe Bend Campgroun | 118.95 |
3 | Badlands National Park: Cedar Pass Campground | 441.72 |
Plan 1000
Stop | Park Campground | Distance |
---|---|---|
1 | Yellowstone National Park: Bridge Bay Campground | 0 |
2 | Dinosaur National Monument: Gates of Lodore Campground | 274.63 |
3 | Bighorn Canyon National Recreation Area: Barry’s Landing & Trail Creek Campground | 578.70 |
4 | Theodore Roosevelt National Park: Juniper Campground | 868.24 |
5 | Badlands National Park: Cedar Pass Campground | 1142.63 |
Plan 10000
Stop | Park Campground | Distance |
---|---|---|
1 | Yellowstone National Park: Indian Creek Campground | 0 |
2 | Dinosaur National Monument: Split Mountain Group Campground | 316.09 |
3 | Yellowstone National Park: Lewis Lake Campground | 590.41 |
4 | Dinosaur National Monument: Gates of Lodore Campground | 851.58 |
5 | Yellowstone National Park: Pebble Creek Campground | 1147.70 |
5 | Theodore Roosevelt National Park: Roundup Group Horse Camp | 1496.24 |
5 | Badlands National Park: Cedar Pass Campground | 1733.97 |
and so on. Clearly the 10,000th plan above is absurd, and the reason is not
just because I’m using haversine distances. The reason did not become clear to
me until I looked at these parks on a map. The main problem is the progress
condition just cares about any nonzero progress towards the destination; if
you imagine a circle spiraling in towards the final destination at the center,
you can see how this condition might result in unwanted road trip plans. That
is, the restriction I defined earlier as 100 <= segment_length <= 600
is
misleading, as it doesn’t actually guarantee 100mi of progress.
And so this motivates the last improvement: rather than ensuring we drive 100mi between stops, instead ensure that the next stop is at least 100mi closer to the final destination.
Huzzah! Now, the enumeration only outputs 17 plans, with at most one intermediary stop. This is much more reasonable for two parks that are only 500mi from one another, and the only reason there are even as many as 17 plans is simply because each park has multiple RV campgrounds.
Furthermore, this last improvement allows me to accomplish what I set out to do: changing the minimum progress to 400mi allows Souffle to output 3,032 road trip plans from the Everglades to Mount Rainier in under a second.
CLI tool
I’ve parameterized the start and end locations into a small CLI tool to run this road trip planner:
Conclusion
This was a fun exercise, but the resulting planner isn’t as useful as it could
be. The main missing ingredients are ordering and choice. While souffle supports
a single arbitrary choice from a given domain, what I’d really like to be able
to do is specify preferences, i.e. first choose from campgrounds with wifi
service and dump stations, and if there are none, then settle for cell service,
etc.. Something like Postgres’ coalesce
function would make this planner much
more useful.
Future improvements I have in mind for this planner are the following:
- Include the cell, internet, and dump amenities in the enumeration output. These can then be externally counted, so that plans can be sorted by stops with the most cell service, or to filter plans that include at least one dump stop every 500mi, etc..
- A few more things should be parameterized from the CLI, such as the RV length, minimum progress and maximum segment distance. The minimum progress turns out to be a very important parameter, and makes the difference of whether or not the computation finishes in a second or days.
- Call out to an API for actual driving distance and/or time, instead of using the haversine approximation.