dev diarySemantic Spatial Memory for Robots using Gaussian Splatting

February 26, 20261 min read

Big Idea

The current plan is to combine Gaussian Splatting with CLIP feature fields so a robot has a persistent, language-queryable memory of its environment—capture a space, then ask it things like "where is the red mug" and get back 3D coordinates a planner can actually use.

This assumes a static scene, dynamic scenes would be cool but trying to build a system like SplaTAM or MonoGS is a bit out of scope at the moment.

Currently working through a few things before getting deeper into development:

how contribution weighting works when aggregating CLIP features per Gaussian
patch-based CLIP variant (DenseCLIP?) vs just tiling crops
what clustering algorithm makes sense for the query output
how well ARKit poses hold up without COLMAP's own estimation
how to structure the distillation MLP without an obvious supervised signal

In the meantime, checkout some of my other projects below:

13mm: Building a Stereo SLAM System from Hardware That Wasn't Designed for It19 min read

dev diary: Active Perception for Autonomous 3D Reconstruction8 min read