Local LLM Setup Guide

Hardware: Gaming rig · 32GB RAM · RTX 3070 (8GB VRAM) · Windows

Purpose: Campaign transcript parsing + Intelligence Brief generation

Runtime: Ollama (GPU-accelerated, local, no cloud, no API costs)

Phase 1

Install Ollama

1 Download Ollama

Go to https://ollama.com/download/windows
Click Download for Windows
Run the installer — standard .exe, just click through
Ollama installs as a background service and adds itself to your system tray

2 Verify Installation

Open PowerShell or Command Prompt (not as admin — just normal):

ollama --version

You should see something like ollama version 0.x.x. If you get "not recognized," restart your terminal or reboot.

3 Verify GPU Detection

ollama list

This should return an empty list (no models yet). The important thing is it runs without errors. Ollama auto-detects your 3070 and will use it for inference.

To confirm GPU is being used, after you pull your first model and run it, check Task Manager → Performance → GPU. You should see VRAM utilization spike when the model loads.

Phase 2

Pull Your Models

4 Pull Llama 3.1 8B (Primary Model)

ollama pull llama3.1:8b

Downloads ~4.7GB. Q4_0 quantization by default — fits your 8GB VRAM cleanly. Wait for it to finish.

5 Pull Mistral 7B (Secondary — for convergent analysis later)

ollama pull mistral:7b

Downloads ~4.1GB. You don't need this immediately, but having it ready means you can run two-model comparison whenever you want.

6 Verify Models Are Installed

ollama list

You should see both models listed with their sizes.

Phase 3

Test It

7 Quick Smoke Test

ollama run llama3.1:8b "What is a d20?"

You should get a fast, coherent response about twenty-sided dice. If this works, your GPU inference is running. Type /bye to exit the interactive session if it drops you into one.

8 Test Structured Output (This Is What Matters)

ollama run llama3.1:8b

This opens an interactive session. Paste this:

Extract the following rolls into CSV format with columns:
pc, roll_type, die, result, modifier, total, target_ac, hit_flag

Session transcript excerpt:
Feliciano attacks Enforcer A with his longsword. He rolls a 14,
plus 4 to hit, total 18 against AC 14. That's a hit. He rolls
damage: 1d8+2, gets a 6, total 8 slashing damage.
Hillie fires Eldritch Blast at the Archer. She rolls a 7, plus 4
to hit, total 11 against AC 12. Miss — the blast scorches the scaffolding.
Feliciano makes a Constitution saving throw against poison. He rolls
a 3, plus 2, total 5 against DC 13. Fail.

Output CSV only, no explanation.

You should get clean CSV output. If it does, your parsing pipeline works. Type /bye to exit.

Phase 4

Set Up Your Prompts

9 Create a Project Folder

mkdir C:\BlindSpot
mkdir C:\BlindSpot\prompts
mkdir C:\BlindSpot\data
mkdir C:\BlindSpot\briefs

10 Create the Parsing Prompt

Create a file at C:\BlindSpot\prompts\parse_rolls.txt with this content:

SYSTEM: You are a D&D session transcript parser. Your job is to extract
structured data from session transcripts. Output ONLY CSV data, no
explanation, no markdown formatting, no commentary.

SCHEMA — ROLL LOG:
pc,roll_type,die,result,modifier,total,target,hit_flag,session,scene

FIELD DEFINITIONS:
- pc: Character name (e.g., Feliciano, Hillie)
- roll_type: attack, save, check, damage, initiative
- die: d20, d8, d10, d6, d4, d12, d100
- result: Natural die result (before modifiers)
- modifier: Bonus/penalty applied
- total: result + modifier
- target: AC or DC the roll was against (0 if unknown)
- hit_flag: 1 = success/hit, 0 = fail/miss, -1 = unknown
- session: Session number (e.g., S01)
- scene: Scene number within session (e.g., 1, 2, 3)

RULES:
- Extract EVERY roll mentioned in the transcript
- If a roll result is not explicitly stated, skip it
- If a modifier is not stated, use 0
- If a target AC/DC is not stated, use 0
- Natural 1 is always a miss (hit_flag = 0)
- Natural 20 is always a hit (hit_flag = 1)
- Damage rolls: hit_flag = -1 (not applicable)
- Do not invent or infer rolls that aren't in the text
- Output the CSV header row first, then data rows

USER: Here is the transcript for session [SESSION_NUMBER]:

[PASTE TRANSCRIPT HERE]

11 Create the Intelligence Brief Prompt

Create a file at C:\BlindSpot\prompts\campaign_brief.txt with this content:

SYSTEM: You are a Campaign Intelligence Analyst for a D&D 5e campaign.
You read structured campaign metrics and produce a concise pre-session
briefing for the DM. Your tone is direct, analytical, and actionable —
like an audit findings report, not a creative writing piece.

OUTPUT FORMAT:
1. CAMPAIGN HEALTH SCORE (0-100, with 1-line justification)
2. TOP 3 PREP PRIORITIES (numbered, 2-3 sentences each)
3. FLAGS (bullet list of active concerns with data citations)
4. RECOMMENDATIONS (2-3 specific, actionable suggestions)

RULES:
- Cite specific numbers from the data (don't generalize)
- If spotlight split exceeds 60/40, flag it
- If any quest has been active 4+ sessions with no progress, flag it
- If any PC's d20 average is below 9.0 over 30+ rolls, flag it
- If any faction clock is at 75%+ capacity, flag it
- If resource usage (spell slots, abilities) is below 50%, flag it
- Keep the entire brief under 400 words
- Do not invent data points not provided in the input
- End with "Next session focus:" and a single sentence

USER: Here are the campaign metrics for pre-session [SESSION_NUMBER] prep:

CAMPAIGN: [CAMPAIGN NAME]
SESSIONS PLAYED: [N]
TOTAL ROLLS: [N]

PC STATS:
[paste PC performance summary]

QUEST STATUS:
[paste quest tracker summary]

FACTION STATE:
[paste faction heat/favor summary]

RESOURCE USAGE:
[paste resource burn rates]

DICE SUMMARY:
[paste dice averages and distribution notes]

SPOTLIGHT:
[paste actions-per-PC breakdown]

Phase 5

Run Your First Parse

12 After Your First Session

Copy the session transcript (or the key portions with rolls) into a text file
Open PowerShell
Run:

# Option A: Interactive (paste the transcript in)
ollama run llama3.1:8b

# Then paste your parse_rolls prompt with the transcript

# Option B: Pipe a file (if you saved the full prompt + transcript)
Get-Content C:\BlindSpot\data\session01_parse_input.txt | ollama run llama3.1:8b

Copy the CSV output into C:\BlindSpot\data\session01_rolls.csv
Import that CSV into your Excel engine's fact table

13 Generate Your First Brief

Once you have 2-3 sessions of data in the engine and dashboard metrics:

Export the dashboard summary values as text
Paste them into the campaign_brief prompt template
Run:

ollama run llama3.1:8b < C:\BlindSpot\data\session03_brief_input.txt > C:\BlindSpot\briefs\session04_brief.txt

Open the brief. Read it. Prep your session.

Phase 6

Convergent Analysis (Optional, Later)

14 Run the Same Brief Through Mistral

ollama run mistral:7b < C:\BlindSpot\data\session03_brief_input.txt > C:\BlindSpot\briefs\session04_brief_mistral.txt

15 Compare

Open both briefs side by side. Where they agree = high confidence. Where they diverge = DM judgment call. You don't need automation for this yet — eyeball comparison is fine for the first few sessions.

Reference

Troubleshooting

"ollama not recognized"

Restart your terminal. If still broken, check that C:\Users\[you]\AppData\Local\Programs\Ollama is in your PATH.

Model runs slow (< 5 tokens/sec)

Check Task Manager → GPU. If VRAM isn't being used, Ollama may be falling back to CPU. Run ollama ps while a model is loaded to see where it's running. Restart the Ollama service from the system tray.

Out of memory errors

Close other GPU-intensive apps (games, browser with hardware acceleration). Your 3070 has 8GB — the 8B model needs ~5GB, so you need ~3GB free.

Response seems wrong or hallucinated

For parsing: spot-check 10 rolls against the transcript. If accuracy is below 90%, your prompt needs tightening — add more examples to the system prompt.
For briefs: remember the LLM is interpreting, not computing. The numbers come from your Excel engine. If the numbers are right and the interpretation is off, adjust the brief prompt thresholds.

Want to try a different model?

ollama pull phi3:medium — Phi-3 Medium 14B, partially offloads to CPU, slower but more capable
ollama pull gemma2:9b — Google's Gemma 2 9B, good at structured tasks
ollama pull llama3.2:3b — smaller, faster, less capable, good for simple parsing

Result

What You Have When This Is Done

Ollama running on your 3070 with GPU acceleration
Llama 3.1 8B for transcript parsing and brief generation
Mistral 7B as a second opinion (convergent analysis)
Prompt templates for both parsing and interpretation
A folder structure ready for session data and briefs
Zero cloud dependency, zero API costs, zero internet required
Inference speed: ~20-30 tokens/second on your hardware

Total setup time: ~30 minutes (mostly waiting for model downloads)