Building a programmatic website with Astro and DuckDB.

6 min read

One of my favorite things about DuckDB is that it makes it extremely easy to run SQL queries on top of almost any data source. This has a ton of applications such as data analysis, data transformation, but one of my favorite ones that I rarely see mentioned is the ability to turn random data sources into structured websites.

Of course, this isn’t anything new. You could always read data from a CSV and then write code to parse, transform, and render it as a website. But the nice thing about DuckDB is that it makes the entire process a lot easier and more flexible.

So let’s give this a shot. I have found a NASA meteorite dataset that I think would make for a fun website. You can download it here.

Setting up the project

I’m going to use Astro and duckdb-async to build this website. You can learn how to set up Astro here if this is your first time using it.

Note: I will be using the Node runtime and will set the output to server in my astro.config.mjs.

Once you have your project set up, let’s create a singleton database that we can use to run our queries.

I’ll place this in the src/utils/db.ts file.

import { Database } from 'duckdb-async';

let _db: Database | null = null;
export const getDatabase = async () => {
  if (_db) return _db;
  _db = await Database.create(':memory:');

  return _db;
};

This will create a new in-memory database and return a singleton instance of it. So, we can call getDatabase() anywhere in our project and we’ll always get the same instance.

Loading the data

Now that we have our database, we can load the data. You can both pass a path to or a URL to DuckDB and it will know how to load it. To be respectful of their API, I will download the data into my data directory and then load it from there.

import { Database } from 'duckdb-async';

let _db: Database | null = null;
export const getDatabase = async () => {
  if (_db) return _db;
  _db = await Database.create(':memory:');

  await _db.exec(`
    CREATE TABLE meteorites AS
    SELECT * FROM './data/meteorites.json';
  `); 

  return _db;
};

Here we are creating a table called meteorites and then loading the data from the meteorites.json file into it. DuckDB will automatically detect that the data is in JSON format and load it accordingly.

Easy. Now let’s head to pages/index.astro and start querying our database.

Querying the database

First, let’s do a simple query to count the number of meteorites.

---
import Layout from '../layouts/Layout.astro';
import { getDatabase } from '../utils/db';

const db = await getDatabase();
const result = await db.all(`select count(*) as count from meteorites`);
---

<Layout title="Meteorites">
  <div>count: {result[0].count}</div>
</Layout>

Now, load the page and you should see the count of meteorites. In my case, it was 45,716 and it only takes a few milliseconds to load.

17:31:48 [200] / 2ms
17:32:17 [200] / 3ms
17:32:17 [200] / 5ms
17:32:17 [200] / 7ms
17:32:17 [200] / 4ms

It’s nuts how fast this is.

Running more queries

Okay, count is cool and all, but it’s boring, let’s run a few more queries. For example, let’s find the top 10 meteorites by mass.

Note: Since for some of the meteorites the mass is unknown, we’ll need to filter those out.

---
import Layout from '../layouts/Layout.astro';
import { getDatabase } from '../utils/db';

const db = await getDatabase();
const result = await db.all(`
  SELECT *
  FROM meteorites 
  WHERE mass != ''
  ORDER BY CAST(mass AS FLOAT) DESC 
  LIMIT 10
`);
---

<Layout title="Meteorites">
  <div>
    <h2>Top 10 Heaviest Meteorites:</h2>
    <ul>
      {
        result.map((meteorite) => (
          <li>
            {meteorite.name}: {Number(meteorite.mass).toLocaleString()}g
          </li>
        ))
      }
    </ul>
  </div>
</Layout>

Now if you load the page, you should see the top 10 heaviest meteorites.

Top 10 Heaviest Meteorites:
Hoba: 60,000,000g
Cape York: 58,200,000g
Campo del Cielo: 50,000,000g
Canyon Diablo: 30,000,000g
Armanty: 28,000,000g
Gibeon: 26,000,000g
Chupaderos: 24,300,000g
Mundrabilla: 24,000,000g
Sikhote-Alin: 23,000,000g
Bacubirito: 22,000,000g

Of course, this is still very basic, but should give you a good idea of what you can do with DuckDB and Astro. Now let’s make this a bit more interesting.

Installing extensions

A nice feature of DuckDB is that it has a ton of extensions and installing them is extremely easy. For example, let’s install the Spatial Extension so we can run spatial queries over the meteorite data.

Head back to your db.ts file and add the following line.

import { Database } from 'duckdb-async';

let _db: Database | null = null;
export const getDatabase = async () => {
  if (_db) return _db;
  _db = await Database.create(':memory:');

  await _db.exec(`
    INSTALL spatial;
    LOAD spatial;
  `);

  await _db.exec(`
    CREATE TABLE meteorites AS
    SELECT * FROM './data/meteorites.json';
  `);

  return _db;
};

This will install the spatial extension and load it. Now we can run some spatial queries over the dataset and do some interesting things.

Running spatial queries

First, let’s run a simple query to find the closest meteorite to a given point. For this, I’m going to use the coordinates of the White House.

---
import Layout from '../layouts/Layout.astro';
import { getDatabase } from '../utils/db';

const db = await getDatabase();

const lat = 38.8977;
const long = -77.0365;

const result = await db.all(
  `
SELECT 
  CAST(reclat AS FLOAT) as reclat,
  CAST(reclong AS FLOAT) as reclong,
  ST_Distance_Sphere(
    ST_Point(CAST(reclat AS FLOAT), CAST(reclong AS FLOAT)),
    ST_Point(?, ?)
  ) AS distance,
  *
FROM meteorites
WHERE reclat != '' AND reclong != ''
ORDER BY distance
LIMIT 10;
`,
  lat,
  long
);
---

<Layout title="Meteorites">
  <div>
    <h2>Nearest Meteorites</h2>
    <table>
      <thead>
        <tr>
          <th>Name</th>
          <th>Distance</th>
          <th>Mass</th>
          <th>Year</th>
        </tr>
      </thead>
      <tbody>
        {
          result.map((meteorite) => (
            <tr>
              <td>{meteorite.name}</td>
              <td>{Math.round(meteorite.distance / 1000)} km</td>
              <td>{meteorite.mass || 'Unknown'} g</td>
              <td>{meteorite.year || 'Unknown'}</td>
            </tr>
          ))
        }
      </tbody>
    </table>
  </div>
</Layout>

Let’s break down this query. For each meteorite, we are casting the reclat and reclong columns to floats and then converting them into a geometry point.

This is the signature of the ST_Point function. As you can see, it takes two double values which represent the longitude and latitude of the point.

GEOMETRY ST_Point (col0 DOUBLE, col1 DOUBLE)

We then use the ST_Distance_Sphere function to calculate the distance between the meteorite and the given point. Here is the signature of the ST_Distance_Sphere function:

DOUBLE ST_Distance_Sphere (col0 POINT_2D, col1 POINT_2D)
DOUBLE ST_Distance_Sphere (col0 GEOMETRY, col1 GEOMETRY)

It can take either two POINT_2D or two GEOMETRY values. In our case, we are using passing the result of the ST_Point function which returns a GEOMETRY value.

Now, if you load the page, you should see the nearest meteorites to the White House.

NameDistanceMassYear
Lorton27 km329.7 g2010
Nanjemoy55 km7500 g1825
Emmitsburg94 km450 g1854
St. Mary’s County99 km24.3 g1919
Mount Joy100 km384000 g1887
Shrewsbury102 km12000 g1907
Sharps122 km1265 g1921
Richmond164 km1800 g1828
Landes186 km69800 g1930
Staunton191 km43500 g1869

Pretty cool, right? Now let’s make this a bit more interesting.

Creating dynamic pages

Okay, to make this a little more interesting, let’s create a page per each state and then list all the meteorites that were found in that state. This would’ve been very easy if the data had a column for states, but it doesn’t. So we’ll have to do a bit of preprocessing on the data to get the state for each meteorite.

Thankfully, this is also very easy to do with DuckDB. What we need to do is to find the geo boundaries for each state, turn those into a polygon, and then run a cross join between the meteorite data and the state boundaries data and use the ST_Within or ST_Intersects function to check if a meteorite is within a state.

Once we have that, then we can use Astro to generate a page per state and render the meteorites that fall within that state.

I promise, it’s not as complicated as it sounds. Just bare with me for a bit.

I’m going to download the geo boundaries for each state and then load them into another table in our database. Similar to what we did with the meteorite data.

import { Database } from 'duckdb-async';

let _db: Database | null = null;
export const getDatabase = async () => {
  if (_db) return _db;
  _db = await Database.create(':memory:');

  await _db.exec(`
    INSTALL spatial;
    LOAD spatial;
  `);

  await _db.exec(`
    CREATE TABLE boundaries AS
    SELECT * FROM './data/us-state-boundaries.json';
  `); 

  await _db.exec(`
    CREATE TABLE meteorites AS
    SELECT * FROM './data/meteorites.json';
  `);

  return _db;
};

Next, let’s create pages/states/[...state].astro and then we’ll loop through all the states and render a page for each one.

---
import Layout from '../../layouts/Layout.astro';
import { getDatabase } from '../../utils/db';

export const prerender = true;

export async function getStaticPaths() {
  const db = await getDatabase();
  const result = await db.all(`select name, stusab from boundaries`);
  return result.map((r) => ({
    params: { state: r.stusab },
    props: { name: r.name }
  }));
}

const { state } = Astro.params;
const { name } = Astro.props;
---

<Layout title={`Meteorites in ${name} (${state})`}>
  <h1>{name} ({state})</h1>
</Layout>

Let’s break down what’s happening here. First, I’m exporting prerender as true which will make Astro prerender this page at build time. Next, I’m exporting a getStaticPaths function which will return an array of objects with the params and props for each state. The params will be the state code (e.g. NY) and the props will be the state name (e.g. New York).

As you notice, I am using the new boundaries table to get the state code and name for each state.

Now if you head to the page for New York, http://localhost:4321/states/NY, you should see something like this.

New York (NY)

Okay, not too bad. Now let’s add some meteorites to the page.

---
import Layout from '../../layouts/Layout.astro';
import { getDatabase } from '../../utils/db';

export const prerender = true;

export async function getStaticPaths() {
  const db = await getDatabase();
  const result = await db.all(`select name, stusab from boundaries`);
  return result.map((r) => ({
    params: { state: r.stusab },
    props: { name: r.name }
  }));
}

const { state } = Astro.params;
const { name } = Astro.props;

const db = await getDatabase();
const result = await db.all(
  `
  SELECT m.*
  FROM meteorites m
  CROSS JOIN boundaries b
  WHERE b.stusab = ?
  AND m.reclat != '' AND m.reclong != ''
  AND ST_Within(
    ST_Point(CAST(m.reclong AS FLOAT), CAST(m.reclat AS FLOAT)), 
    ST_GeomFromGeoJSON(b.st_asgeojson.geometry)
  )
`,
  state
);
---

<Layout title={`Meteorites in ${name} (${state})`}>
  <h1>{name} ({state})</h1>
  <table class="w-full">
    <thead>
      <tr>
        <th>Name</th>
        <th>Mass (g)</th>
        <th>Year</th>
      </tr>
    </thead>
    <tbody>
      {
        result.map((meteorite) => (
          <tr>
            <td>{meteorite.name}</td>
            <td>{meteorite.mass}</td>
            <td>{meteorite.year}</td>
          </tr>
        ))
      }
    </tbody>
  </table>
</Layout>

Let’s break down this SQL query step by step. First, we select all the columns from the meteorites table and then join it with the boundaries table. Then we filter the meteorites to only include those that are within the boundaries of the specified state.

To do this, we first need to convert the GeoJSON boundary into a geometry. This is done using the ST_GeomFromGeoJSON function. This function can take either a GeoJSON string or a JSON object.

GEOMETRY ST_GeomFromGeoJSON (col0 VARCHAR)
GEOMETRY ST_GeomFromGeoJSON (col0 JSON)

An example of a GeoJSON polygon is the following:

{
  "type": "Polygon",
  "coordinates": [
    [
      [1, 2],
      [3, 4],
      [5, 6]
    ]
  ]
}

Next, we need to create a point geometry from the meteorite’s longitude and latitude. This is done using the ST_Point function which we used earlier.

Finally, we need to check if the meteorite’s point falls within the state’s boundary. This is done using the ST_Within function. Here is the signature of the ST_Within function:

BOOLEAN ST_Within (col0 POINT_2D, col1 POLYGON_2D)
BOOLEAN ST_Within (col0 GEOMETRY, col1 GEOMETRY)

So in plain English: “Find all meteorites that have valid coordinates and fell within the boundaries of the specified state.”

Now, if you head to the page for New York, http://localhost:4321/states/NY, you should see something like this.

StateNameMass (g)Year
New York (NY)Bethlehem13.91859
New York (NY)Peekskill125701992
New York (NY)Schenectady283.31968
New York (NY)Yorktown (New York)2501869
New York (NY)Burlington680001819
New York (NY)Cambria163001818
New York (NY)Lasher Creek639.51948
New York (NY)Mount Morris (New York)12.51897
New York (NY)Seneca Falls40001850
New York (NY)South Byron60001915
New York (NY)Tomhannock Creek15001863

This is pretty cool, right? We have now created a page per state and rendered all the meteorites that fell within that state by running some magic SQL.

Finally, let’s create an index page at pages/states/index.astro that lists all the states and links to the page for each state.

---
import Layout from '../../layouts/Layout.astro';
import { getDatabase } from '../../utils/db';

export const prerender = true;

const db = await getDatabase();
const result = await db.all(`select name, stusab from boundaries`);
---

<Layout title="States">
  <h1>States</h1>
  <ul>
    {
      result.map((r) => (
        <li>
          <a href={`/states/${r.stusab}`}>{r.name}</a>
        </li>
      ))
    }
  </ul>
</Layout>

Now, if you head to http://localhost:4321/states/, you should see a list of all the states and links to the page for each state. Of course, we can take this even further and add a count of meteorites for each state, but I’ll leave that as an exercise for you.

Conclusion

This was a fun experiment and I think it shows just how easy it is to turn random data sources into structured websites with DuckDB and Astro. Of course, this is still a very basic example and even with this simple example, you can go a lot further by adding more details to each page, comparing the data between states, and creating charts, maps, and other visualizations for different metrics.

Happy hacking!

more posts

back