Building a programmatic website with Astro and DuckDB.
One of my favorite things about DuckDB is that it makes it extremely easy to run SQL queries on top of almost any data source. This has a ton of applications such as data analysis, data transformation, but one of my favorite ones that I rarely see mentioned is the ability to turn random data sources into structured websites.
Of course, this isn’t anything new. You could always read data from a CSV and then write code to parse, transform, and render it as a website. But the nice thing about DuckDB is that it makes the entire process a lot easier and more flexible.
So let’s give this a shot. I have found a NASA meteorite dataset that I think would make for a fun website. You can download it here.
Setting up the project
I’m going to use Astro and duckdb-async to build this website. You can learn how to set up Astro here if this is your first time using it.
Note: I will be using the Node runtime and will set the
output
toserver
in myastro.config.mjs
.
Once you have your project set up, let’s create a singleton database that we can use to run our queries.
I’ll place this in the src/utils/db.ts
file.
import { Database } from 'duckdb-async';
let _db: Database | null = null;
export const getDatabase = async () => {
if (_db) return _db;
_db = await Database.create(':memory:');
return _db;
};
This will create a new in-memory database and return a singleton instance of it. So, we can call getDatabase()
anywhere in our project and we’ll always get the same instance.
Loading the data
Now that we have our database, we can load the data. You can both pass a path to or a URL to DuckDB and it will know how to load it. To be respectful of their API, I will download the data into my data
directory and then load it from there.
import { Database } from 'duckdb-async';
let _db: Database | null = null;
export const getDatabase = async () => {
if (_db) return _db;
_db = await Database.create(':memory:');
await _db.exec(`
CREATE TABLE meteorites AS
SELECT * FROM './data/meteorites.json';
`);
return _db;
};
Here we are creating a table called meteorites
and then loading the data from the meteorites.json
file into it. DuckDB will automatically detect that the data is in JSON format and load it accordingly.
Easy. Now let’s head to pages/index.astro
and start querying our database.
Querying the database
First, let’s do a simple query to count the number of meteorites.
---
import Layout from '../layouts/Layout.astro';
import { getDatabase } from '../utils/db';
const db = await getDatabase();
const result = await db.all(`select count(*) as count from meteorites`);
---
<Layout title="Meteorites">
<div>count: {result[0].count}</div>
</Layout>
Now, load the page and you should see the count of meteorites. In my case, it was 45,716 and it only takes a few milliseconds to load.
17:31:48 [200] / 2ms
17:32:17 [200] / 3ms
17:32:17 [200] / 5ms
17:32:17 [200] / 7ms
17:32:17 [200] / 4ms
It’s nuts how fast this is.
Running more queries
Okay, count is cool and all, but it’s boring, let’s run a few more queries. For example, let’s find the top 10 meteorites by mass.
Note: Since for some of the meteorites the mass is unknown, we’ll need to filter those out.
---
import Layout from '../layouts/Layout.astro';
import { getDatabase } from '../utils/db';
const db = await getDatabase();
const result = await db.all(`
SELECT *
FROM meteorites
WHERE mass != ''
ORDER BY CAST(mass AS FLOAT) DESC
LIMIT 10
`);
---
<Layout title="Meteorites">
<div>
<h2>Top 10 Heaviest Meteorites:</h2>
<ul>
{
result.map((meteorite) => (
<li>
{meteorite.name}: {Number(meteorite.mass).toLocaleString()}g
</li>
))
}
</ul>
</div>
</Layout>
Now if you load the page, you should see the top 10 heaviest meteorites.
Top 10 Heaviest Meteorites:
Hoba: 60,000,000g
Cape York: 58,200,000g
Campo del Cielo: 50,000,000g
Canyon Diablo: 30,000,000g
Armanty: 28,000,000g
Gibeon: 26,000,000g
Chupaderos: 24,300,000g
Mundrabilla: 24,000,000g
Sikhote-Alin: 23,000,000g
Bacubirito: 22,000,000g
Of course, this is still very basic, but should give you a good idea of what you can do with DuckDB and Astro. Now let’s make this a bit more interesting.
Installing extensions
A nice feature of DuckDB is that it has a ton of extensions and installing them is extremely easy. For example, let’s install the Spatial Extension so we can run spatial queries over the meteorite data.
Head back to your db.ts
file and add the following line.
import { Database } from 'duckdb-async';
let _db: Database | null = null;
export const getDatabase = async () => {
if (_db) return _db;
_db = await Database.create(':memory:');
await _db.exec(`
INSTALL spatial;
LOAD spatial;
`);
await _db.exec(`
CREATE TABLE meteorites AS
SELECT * FROM './data/meteorites.json';
`);
return _db;
};
This will install the spatial extension and load it. Now we can run some spatial queries over the dataset and do some interesting things.
Running spatial queries
First, let’s run a simple query to find the closest meteorite to a given point. For this, I’m going to use the coordinates of the White House.
---
import Layout from '../layouts/Layout.astro';
import { getDatabase } from '../utils/db';
const db = await getDatabase();
const lat = 38.8977;
const long = -77.0365;
const result = await db.all(
`
SELECT
CAST(reclat AS FLOAT) as reclat,
CAST(reclong AS FLOAT) as reclong,
ST_Distance_Sphere(
ST_Point(CAST(reclat AS FLOAT), CAST(reclong AS FLOAT)),
ST_Point(?, ?)
) AS distance,
*
FROM meteorites
WHERE reclat != '' AND reclong != ''
ORDER BY distance
LIMIT 10;
`,
lat,
long
);
---
<Layout title="Meteorites">
<div>
<h2>Nearest Meteorites</h2>
<table>
<thead>
<tr>
<th>Name</th>
<th>Distance</th>
<th>Mass</th>
<th>Year</th>
</tr>
</thead>
<tbody>
{
result.map((meteorite) => (
<tr>
<td>{meteorite.name}</td>
<td>{Math.round(meteorite.distance / 1000)} km</td>
<td>{meteorite.mass || 'Unknown'} g</td>
<td>{meteorite.year || 'Unknown'}</td>
</tr>
))
}
</tbody>
</table>
</div>
</Layout>
Let’s break down this query. For each meteorite, we are casting the reclat
and reclong
columns to floats and then converting them into a geometry point.
This is the signature of the ST_Point
function. As you can see, it takes two double values which represent the longitude and latitude of the point.
GEOMETRY ST_Point (col0 DOUBLE, col1 DOUBLE)
We then use the ST_Distance_Sphere
function to calculate the distance between the meteorite and the given point. Here is the signature of the ST_Distance_Sphere
function:
DOUBLE ST_Distance_Sphere (col0 POINT_2D, col1 POINT_2D)
DOUBLE ST_Distance_Sphere (col0 GEOMETRY, col1 GEOMETRY)
It can take either two POINT_2D
or two GEOMETRY
values. In our case, we are using passing the result of the ST_Point
function which returns a GEOMETRY
value.
Now, if you load the page, you should see the nearest meteorites to the White House.
Name | Distance | Mass | Year |
---|---|---|---|
Lorton | 27 km | 329.7 g | 2010 |
Nanjemoy | 55 km | 7500 g | 1825 |
Emmitsburg | 94 km | 450 g | 1854 |
St. Mary’s County | 99 km | 24.3 g | 1919 |
Mount Joy | 100 km | 384000 g | 1887 |
Shrewsbury | 102 km | 12000 g | 1907 |
Sharps | 122 km | 1265 g | 1921 |
Richmond | 164 km | 1800 g | 1828 |
Landes | 186 km | 69800 g | 1930 |
Staunton | 191 km | 43500 g | 1869 |
Pretty cool, right? Now let’s make this a bit more interesting.
Creating dynamic pages
Okay, to make this a little more interesting, let’s create a page per each state and then list all the meteorites that were found in that state. This would’ve been very easy if the data had a column for states, but it doesn’t. So we’ll have to do a bit of preprocessing on the data to get the state for each meteorite.
Thankfully, this is also very easy to do with DuckDB. What we need to do is to find the geo boundaries for each state, turn those into a polygon, and then run a cross join between the meteorite data and the state boundaries data and use the ST_Within
or ST_Intersects
function to check if a meteorite is within a state.
Once we have that, then we can use Astro to generate a page per state and render the meteorites that fall within that state.
I promise, it’s not as complicated as it sounds. Just bare with me for a bit.
I’m going to download the geo boundaries for each state and then load them into another table in our database. Similar to what we did with the meteorite data.
import { Database } from 'duckdb-async';
let _db: Database | null = null;
export const getDatabase = async () => {
if (_db) return _db;
_db = await Database.create(':memory:');
await _db.exec(`
INSTALL spatial;
LOAD spatial;
`);
await _db.exec(`
CREATE TABLE boundaries AS
SELECT * FROM './data/us-state-boundaries.json';
`);
await _db.exec(`
CREATE TABLE meteorites AS
SELECT * FROM './data/meteorites.json';
`);
return _db;
};
Next, let’s create pages/states/[...state].astro
and then we’ll loop through all the states and render a page for each one.
---
import Layout from '../../layouts/Layout.astro';
import { getDatabase } from '../../utils/db';
export const prerender = true;
export async function getStaticPaths() {
const db = await getDatabase();
const result = await db.all(`select name, stusab from boundaries`);
return result.map((r) => ({
params: { state: r.stusab },
props: { name: r.name }
}));
}
const { state } = Astro.params;
const { name } = Astro.props;
---
<Layout title={`Meteorites in ${name} (${state})`}>
<h1>{name} ({state})</h1>
</Layout>
Let’s break down what’s happening here. First, I’m exporting prerender
as true
which will make Astro prerender this page at build time. Next, I’m exporting a getStaticPaths
function which will return an array of objects with the params
and props
for each state. The params
will be the state code (e.g. NY
) and the props
will be the state name (e.g. New York
).
As you notice, I am using the new boundaries
table to get the state code and name for each state.
Now if you head to the page for New York, http://localhost:4321/states/NY
, you should see something like this.
New York (NY)
Okay, not too bad. Now let’s add some meteorites to the page.
---
import Layout from '../../layouts/Layout.astro';
import { getDatabase } from '../../utils/db';
export const prerender = true;
export async function getStaticPaths() {
const db = await getDatabase();
const result = await db.all(`select name, stusab from boundaries`);
return result.map((r) => ({
params: { state: r.stusab },
props: { name: r.name }
}));
}
const { state } = Astro.params;
const { name } = Astro.props;
const db = await getDatabase();
const result = await db.all(
`
SELECT m.*
FROM meteorites m
CROSS JOIN boundaries b
WHERE b.stusab = ?
AND m.reclat != '' AND m.reclong != ''
AND ST_Within(
ST_Point(CAST(m.reclong AS FLOAT), CAST(m.reclat AS FLOAT)),
ST_GeomFromGeoJSON(b.st_asgeojson.geometry)
)
`,
state
);
---
<Layout title={`Meteorites in ${name} (${state})`}>
<h1>{name} ({state})</h1>
<table class="w-full">
<thead>
<tr>
<th>Name</th>
<th>Mass (g)</th>
<th>Year</th>
</tr>
</thead>
<tbody>
{
result.map((meteorite) => (
<tr>
<td>{meteorite.name}</td>
<td>{meteorite.mass}</td>
<td>{meteorite.year}</td>
</tr>
))
}
</tbody>
</table>
</Layout>
Let’s break down this SQL query step by step. First, we select all the columns from the meteorites table and then join it with the boundaries table. Then we filter the meteorites to only include those that are within the boundaries of the specified state.
To do this, we first need to convert the GeoJSON boundary into a geometry. This is done using the ST_GeomFromGeoJSON
function. This function can take either a GeoJSON string or a JSON object.
GEOMETRY ST_GeomFromGeoJSON (col0 VARCHAR)
GEOMETRY ST_GeomFromGeoJSON (col0 JSON)
An example of a GeoJSON polygon is the following:
{
"type": "Polygon",
"coordinates": [
[
[1, 2],
[3, 4],
[5, 6]
]
]
}
Next, we need to create a point geometry from the meteorite’s longitude and latitude. This is done using the ST_Point
function which we used earlier.
Finally, we need to check if the meteorite’s point falls within the state’s boundary. This is done using the ST_Within
function. Here is the signature of the ST_Within
function:
BOOLEAN ST_Within (col0 POINT_2D, col1 POLYGON_2D)
BOOLEAN ST_Within (col0 GEOMETRY, col1 GEOMETRY)
So in plain English: “Find all meteorites that have valid coordinates and fell within the boundaries of the specified state.”
Now, if you head to the page for New York, http://localhost:4321/states/NY
, you should see something like this.
State | Name | Mass (g) | Year |
---|---|---|---|
New York (NY) | Bethlehem | 13.9 | 1859 |
New York (NY) | Peekskill | 12570 | 1992 |
New York (NY) | Schenectady | 283.3 | 1968 |
New York (NY) | Yorktown (New York) | 250 | 1869 |
New York (NY) | Burlington | 68000 | 1819 |
New York (NY) | Cambria | 16300 | 1818 |
New York (NY) | Lasher Creek | 639.5 | 1948 |
New York (NY) | Mount Morris (New York) | 12.5 | 1897 |
New York (NY) | Seneca Falls | 4000 | 1850 |
New York (NY) | South Byron | 6000 | 1915 |
New York (NY) | Tomhannock Creek | 1500 | 1863 |
This is pretty cool, right? We have now created a page per state and rendered all the meteorites that fell within that state by running some magic SQL.
Finally, let’s create an index page at pages/states/index.astro
that lists all the states and links to the page for each state.
---
import Layout from '../../layouts/Layout.astro';
import { getDatabase } from '../../utils/db';
export const prerender = true;
const db = await getDatabase();
const result = await db.all(`select name, stusab from boundaries`);
---
<Layout title="States">
<h1>States</h1>
<ul>
{
result.map((r) => (
<li>
<a href={`/states/${r.stusab}`}>{r.name}</a>
</li>
))
}
</ul>
</Layout>
Now, if you head to http://localhost:4321/states/
, you should see a list of all the states and links to the page for each state. Of course, we can take this even further and add a count of meteorites for each state, but I’ll leave that as an exercise for you.
Conclusion
This was a fun experiment and I think it shows just how easy it is to turn random data sources into structured websites with DuckDB and Astro. Of course, this is still a very basic example and even with this simple example, you can go a lot further by adding more details to each page, comparing the data between states, and creating charts, maps, and other visualizations for different metrics.
Happy hacking!