Multimodal vision foundation
Vision that stays grounded in the physical world.
VizPal builds a multimodal vision + realtime dialogue foundation for companions that can see, reason, and coordinate in messy real‑world environments. Hearthy is our first application: an AI kitchen companion.
Multimodal perception
State‑aware visionFrom ingredient states to heat checkpoints, the model reasons about what is happening now—not just what “should happen” in a recipe.
Realtime dialogue
Hands‑busy collaborationOptimized for noisy, interruption‑heavy environments. Low friction voice loop with step decomposition and recovery prompts.
Tool orchestration
Actionable agentsStructured tool calls with latency budgeting and fallback strategies—designed for on‑device capture + cloud reasoning.
Reference architecture
A pragmatic stack for realtime, multimodal assistants: voice loop first, vision checks when it matters, memory for personalization, and operational observability.
Core loop
- Realtime audio streaming (ASR) → LLM orchestration → low‑latency TTS playback
- Optional vision checks on high‑risk checkpoints (doneness, safety, portion)
- Action‑first responses with guardrails and graceful fallbacks
Business intent
- Appliance integrations: embedded companions for oven/fridge/cooktops
- Consumer products: first‑party apps powered by VizPal
- Partner APIs: controlled access for multimodal workflows
Contact
For partnerships, pilots, or investor conversations, reach out via email. For our first product, see Hearthy.