Gemma 4 · macOS Architecture

Local AI infrastructure
on Apple Silicon

Gemma 4 via Ollama with remote API, browser UI, Screen Share & SSH. Click any node to explore setup details.

Model gemma4:12b

Ollama API :11434

Unified RAM 9.6 GB

Animated — primary data flow

Solid — secondary flow

Dashed — remote access path

Remote devices

API client

:11434

Python · Node · curl

Browser

:3000

Open WebUI

Screen Share

:5900

VNC / Finder

SSH terminal

:22

Remote Login

Network — LAN (192.168.x.x) or internet via ngrok / SSH tunnel

macOS host · Apple Silicon

macOS services

Firewall

Allow :11434 :5900 :22

Open WebUI

Docker · port 3000

Screen Sharing

System Settings → Sharing

Ollama

OLLAMA_HOST

0.0.0.0:11434 via LaunchAgent

Keep-alive

OLLAMA_KEEP_ALIVE = −1

REST API

OpenAI-compatible /v1

ngrok tunnel

Public HTTPS endpoint

llama.cpp / MLX inference engine

Apple Silicon Metal · unified memory · automatic GPU offload

Gemma 4 model weights

e4b · 12b · 27b — GGUF 4-bit · 128K–256K context

Local storage

~/.ollama/models/blobs — SHA256-named GGUF files on disk

↑ Select any node to view details and commands