BlindSpot/LLM

⚡ Local Inference Layer

BlindSpot × LLM

Your data. Your hardware. Your models.

Every BlindSpot engine generates structured data. The LLM layer reads that data and tells you what it means — pre-session briefs, pattern interpretation, convergent analysis. Runs on your GPU. Zero cloud. Zero API costs.

BlindSpot · llm⚡ LOCAL INFERENCE

● ollama runninggpu: detectedmodels: 3 loaded

Runtime

Ollama

local · no cloud

Inference Speed

~25

tokens/sec · GPU

Models Loaded

3

primary · secondary · compact

API Cost

$0

forever · your hardware

Privacy

100%

nothing leaves your machine

📊

Engine Data

Fact tables, dimensions

→

📝

Prompt Template

Governed schema, rules

→

🧠

Local LLM

GPU-accelerated inference

→

📋

Intelligence Brief

Flags, recommendations

→

🎯

Action

You decide. Model interprets.

Model · Roster

Primary · Parsing + Briefs

Llama 3.1 8B

Meta · Q4 quantization · Instruct-tuned

VRAM

~5 GB

Speed

~25 t/s

Download

4.7 GB

Context

128K

structured outputCSV parsingbrief generation

Secondary · Convergent Analysis

Mistral 7B

Mistral AI · Q4 quantization · Instruct v0.3

VRAM

~4.5 GB

Speed

~30 t/s

Download

4.1 GB

Context

32K

second opinionreasoningcomparison

Compact · Fast Parsing

Phi-3.5 Mini

Microsoft · 3.8B parameters · Long context

VRAM

~2.5 GB

Speed

~45 t/s

Download

2.2 GB

Context

128K

fast extractionlong documentslightweight

Convergent · Analysis Demo

DEMO

Prompt: "Analyze this week’s trading performance and flag blind spots"

 Llama 3.1 8B

Flag 1: Win rate dropped from 68% to 52% on Thursday-Friday trades.

Flag 2: Avg loss size increased 40% this week. Stop discipline may be slipping.

Flag 3: 3 of 4 losses were momentum plays. Mean reversion outperformed.

 Mistral 7B

Flag 1: Late-week performance degradation. Thu-Fri win rate 50% vs Mon-Wed 71%.

Flag 2: Loss magnitude expanding — risk/reward ratio inverted on 2 trades.

Flag 3: Sector concentration: 80% of trades in tech. Diversification blind spot.

✓

Convergence: Both flag late-week decay and loss magnitude expansion. High confidence signals. Mistral uniquely catches sector concentration — a blind spot the primary model missed.

blindspot · inference session
▸ show session transcript

Hardware · Compatibility Matrix

GPU	VRAM	8B	7B	3.8B	Dual Model	Speed
RTX 4090	24 GB	Full	Full	Full	Both in VRAM	~50+ t/s
RTX 4070 Ti	12 GB	Full	Full	Full	Swap required	~35 t/s
RTX 3070	8 GB	Full	Full	Full	Swap required	~25 t/s
RTX 3060	12 GB	Full	Full	Full	Swap required	~20 t/s
GTX 1660	6 GB	Tight	Full	Full	No	~12 t/s
CPU Only	—	8GB+ RAM	8GB+ RAM	4GB+ RAM	Slow	~3-5 t/s
Apple M1/M2/M3	Shared	Full	Full	Full	Unified memory	~20-40 t/s

Setup · Readiness

✓

Install Ollama runtime

~2 min

✓

Verify GPU detection

ollama list

✓

Pull primary model (8B)

~5 min download

Pull secondary model (7B)

~4 min download

Run smoke test

structured output

Create prompt templates

parse + brief

First real inference run

engine data → brief

Convergent analysis run

dual-model compare

Architecture · Principles

→

The LLM interprets. It doesn't compute.

Numbers come from the engine. The model reads them.

→

Convergent > singular.

Two models agreeing = signal. Divergence = judgment call.

→

Prompt templates are the contract.

Structure > intelligence.

→

Local means sovereign.

Privacy isn't a feature. It's the architecture.

→

Models are swappable.

The engine doesn't care which model reads the data.

BlindSpot · Local LLM Integration v1.0

ollama · gpu-accelerated · zero cloud · governed prompts

← Back to BlindSpot