BlindSpot/LLM
⚡ Local Inference Layer

BlindSpot × LLM

Your data. Your hardware. Your models.

Every BlindSpot engine generates structured data. The LLM layer reads that data and tells you what it means — pre-session briefs, pattern interpretation, convergent analysis. Runs on your GPU. Zero cloud. Zero API costs.

BlindSpot · llm⚡ LOCAL INFERENCE
ollama runninggpu: detectedmodels: 3 loaded
Runtime
Ollama
local · no cloud
Inference Speed
~25
tokens/sec · GPU
Models Loaded
3
primary · secondary · compact
API Cost
$0
forever · your hardware
Privacy
100%
nothing leaves your machine
📊
Engine Data
Fact tables, dimensions
📝
Prompt Template
Governed schema, rules
🧠
Local LLM
GPU-accelerated inference
📋
Intelligence Brief
Flags, recommendations
🎯
Action
You decide. Model interprets.
Model · Roster
Primary · Parsing + Briefs
Llama 3.1 8B
Meta · Q4 quantization · Instruct-tuned
VRAM
~5 GB
Speed
~25 t/s
Download
4.7 GB
Context
128K
structured outputCSV parsingbrief generation
Secondary · Convergent Analysis
Mistral 7B
Mistral AI · Q4 quantization · Instruct v0.3
VRAM
~4.5 GB
Speed
~30 t/s
Download
4.1 GB
Context
32K
second opinionreasoningcomparison
Compact · Fast Parsing
Phi-3.5 Mini
Microsoft · 3.8B parameters · Long context
VRAM
~2.5 GB
Speed
~45 t/s
Download
2.2 GB
Context
128K
fast extractionlong documentslightweight
Convergent · Analysis Demo
DEMO
Prompt: "Analyze this week’s trading performance and flag blind spots"
Llama 3.1 8B
Flag 1: Win rate dropped from 68% to 52% on Thursday-Friday trades.
Flag 2: Avg loss size increased 40% this week. Stop discipline may be slipping.
Flag 3: 3 of 4 losses were momentum plays. Mean reversion outperformed.
VS
Mistral 7B
Flag 1: Late-week performance degradation. Thu-Fri win rate 50% vs Mon-Wed 71%.
Flag 2: Loss magnitude expanding — risk/reward ratio inverted on 2 trades.
Flag 3: Sector concentration: 80% of trades in tech. Diversification blind spot.
Convergence: Both flag late-week decay and loss magnitude expansion. High confidence signals. Mistral uniquely catches sector concentration — a blind spot the primary model missed.
blindspot · inference session
▸ show session transcript
Hardware · Compatibility Matrix
GPUVRAM8B7B3.8BDual ModelSpeed
RTX 409024 GBFullFullFullBoth in VRAM~50+ t/s
RTX 4070 Ti12 GBFullFullFullSwap required~35 t/s
RTX 30708 GBFullFullFullSwap required~25 t/s
RTX 306012 GBFullFullFullSwap required~20 t/s
GTX 16606 GBTightFullFullNo~12 t/s
CPU Only8GB+ RAM8GB+ RAM4GB+ RAMSlow~3-5 t/s
Apple M1/M2/M3SharedFullFullFullUnified memory~20-40 t/s
Setup · Readiness
Install Ollama runtime
~2 min
Verify GPU detection
ollama list
Pull primary model (8B)
~5 min download
Pull secondary model (7B)
~4 min download
Run smoke test
structured output
Create prompt templates
parse + brief
First real inference run
engine data → brief
Convergent analysis run
dual-model compare
Architecture · Principles
The LLM interprets. It doesn't compute.
Numbers come from the engine. The model reads them.
Convergent > singular.
Two models agreeing = signal. Divergence = judgment call.
Prompt templates are the contract.
Structure > intelligence.
Local means sovereign.
Privacy isn't a feature. It's the architecture.
Models are swappable.
The engine doesn't care which model reads the data.
BlindSpot · Local LLM Integration v1.0
ollama · gpu-accelerated · zero cloud · governed prompts
← Back to BlindSpot