The current plan is to combine Gaussian Splatting with CLIP feature fields so a robot has a persistent, language-queryable memory of its environment—capture a space, then ask it things like "where is the red mug" and get back 3D coordinates a planner can actually use.
This assumes a static scene, dynamic scenes would be cool but trying to build a system like SplaTAM or MonoGS is a bit out of scope at the moment.
Currently working through a few things before getting deeper into development:
In the meantime, checkout some of my other projects below: