The MediaFind blog

Building private, on-device media search

How we turn a folder of audio and video into a searchable library โ€” using best-in-class open models that run entirely on your Mac. No cloud, no API keys, no telemetry.

๐ŸŽ™๏ธ
Transcription

How MediaFind transcribes your media entirely on-device with Whisper

From ffmpeg decode to word-level timestamps โ€” the speech-to-text pipeline that never sends a byte to the cloud.

Read the deep dive โ†’
๐Ÿ”
Search

Search by meaning: embeddings, CLIP and a local vector index

Why โ€œa rocket blasting offโ€ finds the right clip even when nobody said those words โ€” semantic text, visual, and OCR search combined.

Read the deep dive โ†’
๐Ÿ—ฃ๏ธ
People & privacy

Who said it, who's in it โ€” diarization & face recognition, privately

Speaker diarization and an opt-in face library that label your media without anything ever leaving the machine.

Read the deep dive โ†’