How Preheat Works
This document explains the operational principles of preheatβhow it monitors, learns, predicts, and preloads applications.
High-Level Overview
Preheat operates in a continuous cycle:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PREHEAT OPERATION CYCLE β
β β
β Every 20 seconds (configurable): β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β SCAN β -> β LEARN β -> β PREDICT β -> β PRELOAD β β
β β β β β β β β β β
β β Check β β Update β β Score β β Read β β
β β /proc β β Markov β β each β β files β β
β β for β β chain β β app's β β into β β
β β running β β model β β launch β β memory β β
β β apps β β β β prob. β β cache β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β (repeat) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The Four Phases
Phase 1: Scan
Preheat monitors running processes by scanning the /proc filesystem:
/proc/
βββ 1234/ # Process with PID 1234
β βββ exe -> /usr/bin/firefox # Executable path
β βββ maps # Memory mappings (libraries)
βββ 5678/
β βββ exe -> /usr/bin/code
β βββ maps
βββ ...
What it collects:
- Which executables are currently running
- What shared libraries each process uses
- File paths and sizes
Filtering rules:
- Ignores system daemons (paths starting with
/usr/sbin/) - Focuses on user applications (paths starting with
/usr/) - Respects configured include/exclude patterns
Phase 2: Learn
The daemon builds a statistical model of application co-occurrence:
Markov Chain Model:
A Markov chain tracks which applications tend to run together or in sequence:
Example: User often opens Firefox, then VS Code, then Terminal
State transitions recorded:
Firefox --> VS Code (probability: 0.6)
Firefox --> Terminal (probability: 0.3)
Firefox --> Other (probability: 0.1)
VS Code --> Terminal (probability: 0.7)
VS Code --> Firefox (probability: 0.2)
...
Learning over time:
- Each scan updates transition probabilities
- Recent observations have more weight
- The model adapts as habits change
Weighted Launch Counting (v1.0.0+):
Preheat uses sophisticated launch counting to accurately track application usage:
Launch Weight Formula:
weight = base Γ log(1 + duration/divisor) Γ user_mult Γ short_penalty
Where:
base = 1.0
divisor = 60 seconds (configurable)
user_mult = 2.0 for user-initiated, 1.0 for automated (configurable)
short_penalty = 0.3 if duration < 5 seconds, 1.0 otherwise
Progressive Weight Accumulation:
Application launched:
t=0s β raw_launches++ (immediate count)
t=20s β weighted_launches += 0.8 (first cycle)
t=40s β weighted_launches += 0.6 (second cycle)
t=60s β weighted_launches += 0.5 (third cycle)
... β continues until exit
- Immediate counting:
raw_launchesincrements when process starts (not on exit) - Incremental weighting: Weight accumulates every scan cycle while running
- Short-lived penalty: Processes <5 seconds get only 30% weight (crashes, transient processes)
Example comparison:
| Scenario | Duration | User? | Weight Formula | Final Weight |
|---|---|---|---|---|
| Grep (automated) | 0.1s | No | 1.0 Γ log(1.002) Γ 1.0 Γ 0.3 | ~0.001 |
| Failed launch (crash) | 2s | Yes | 1.0 Γ log(1.03) Γ 2.0 Γ 0.3 | ~0.02 |
| Terminal session | 600s | Yes | 1.0 Γ log(11) Γ 2.0 Γ 1.0 | ~4.8 |
| Firefox (long session) | 7200s | Yes | 1.0 Γ log(121) Γ 2.0 Γ 1.0 | ~9.8 |
This prevents crashes and trivial processes from inflating prediction scores.
Phase 3: Predict
Using the learned model, preheat calculates which applications are most likely to be launched:
Scoring factors:
| Factor | Weight | Description |
|---|---|---|
| Markov probability | High | Based on currently running apps |
| Correlation coefficient | Medium | Statistical co-occurrence strength |
| Recency | Medium | Recently used apps score higher |
| Frequency | Low | Overall launch count |
Example prediction:
Currently running: Firefox, Terminal
Predictions:
1. VS Code (score: 0.82) - Often follows Firefox
2. File Manager (score: 0.65) - Frequently co-occurs
3. Spotify (score: 0.41) - Sometimes used together
4. LibreOffice (score: 0.23) - Rare combination
Phase 4: Preload
High-scoring applications are preloaded into the disk cache:
Application: VS Code
Files to preload:
/usr/share/code/code (main binary, 120 MB)
/usr/lib/x86_64-linux-gnu/libnode.so
/usr/lib/x86_64-linux-gnu/libv8.so
... (shared libraries)
Preloading mechanism:
- Check available memory (respect memory limits)
- Get list of files for predicted applications
- Sort files for optimal I/O (by disk block for HDDs)
- Call
readahead(2)system call on each file - Kernel reads file data into disk cache
The readahead(2) System Call
The core of preloading is the Linux readahead() system call:
readahead(fd, offset, count);
What it does:
- Initiates asynchronous reading of file data
- Data goes into the kernelβs page cache
- Subsequent reads of that data are serviced from memory
- No data is copied to userspaceβminimal overhead
Why itβs efficient:
- Non-blocking (doesnβt wait for I/O to complete)
- Uses the kernelβs existing caching infrastructure
- No special privileges beyond file read access
- Works with any filesystem
Memory Management
Preheat carefully manages how much memory it uses for preloading:
Memory Budget Calculation
Available for preloading = max(0, Total Γ memtotal% + Free Γ memfree%)
+ Cached Γ memcached%
With defaults (memtotal=-10, memfree=50, memcached=0):
- Uses 50% of free memory
- Subtracts 10% of total as safety margin
- Result: Conservative preloading that doesnβt starve the system
Example
System: 8 GB total RAM, 3 GB free, 2 GB cached
Calculation:
Total contribution: 8192 MB Γ (-10%) = -819 MB
Free contribution: 3072 MB Γ (50%) = 1536 MB
Cached contribution: 2048 MB Γ (0%) = 0 MB
Available = max(0, -819 + 1536) + 0 = 717 MB
Preheat will preload up to 717 MB of application data.
Memory Pressure Response
When system memory becomes scarce:
- Preloading budget decreases automatically
- Already-cached data may be evicted by kernel
- Preheat gracefully reduces its activity
I/O Optimization
Sorting Strategies
How files are ordered for preloading affects disk efficiency:
| Strategy | Best For | Description |
|---|---|---|
| 0 - None | SSD, Flash | No sorting (random access is fast) |
| 1 - Path | Network FS | Group by directory path |
| 2 - Inode | Most FS | Sort by inode number |
| 3 - Block | HDD | Sort by physical disk block |
Block sorting (default, strategy 3):
Before sorting: After sorting:
File A (block 500) File C (block 100)
File C (block 100) File A (block 500)
File B (block 450) File B (block 450)
Result: Disk head moves in one direction, minimizing seeks
Parallel Readahead
Multiple files can be read simultaneously using worker processes:
Main daemon
β
βββ Worker 1: Reading /usr/bin/firefox
βββ Worker 2: Reading /usr/lib/libgtk.so
βββ Worker 3: Reading /usr/share/code/code
βββ ... (up to 30 by default)
This utilizes disk queue depth and parallelism for faster preloading.
Timing and Scheduling
The Cycle Timer
Time: 0s 20s 40s 60s ...
β β β β
βΌ βΌ βΌ βΌ
Scan Scan Scan Scan
Learn Learn Learn Learn
Predict Predict Predict Predict
Preload Preload Preload Preload
Each cycle (default: 20 seconds):
- Completes within a few hundred milliseconds
- Most time is spent waiting for I/O
- CPU usage during scan: typically <1%
Nice Level
Preheat runs with elevated nice level (default: 15):
- Lower priority than interactive applications
- Yields CPU to foreground tasks
- I/O priority is also reduced
State Persistence
Whatβs Saved
The state file contains:
- All tracked applications and their file mappings
- Markov chain transition probabilities
- Launch counts and timestamps
- Correlation coefficients
Save Triggers
- Autosave timer: Every hour by default
- Graceful shutdown: On SIGTERM
- Manual save: Via
preheat-ctl saveor SIGUSR2
State File Location
/usr/local/var/lib/preheat/preheat.state # Binary format
On startup:
- If state file exists: Load and continue learning
- If missing: Start fresh (first hour has limited predictions)
Interaction with Linux Subsystems
/proc Filesystem
Preheat reads:
/proc/[pid]/exe # Symlink to executable
/proc/[pid]/maps # Memory mappings
/proc/[pid]/stat # Process statistics
/proc/meminfo # System memory status
Page Cache (Disk Cache)
Kernel Page Cache
βββββββββββββββββββββββββββββββββββββββββ
β ββββββββ ββββββββ ββββββββ ββββββββ β
β β Page β β Page β β Page β β Page β...β
β β(file)β β(file)β β(anon)β β(file)β β
β ββββββββ ββββββββ ββββββββ ββββββββ β
β β
β Preheat adds file pages via readahead β
β Kernel manages eviction automatically β
βββββββββββββββββββββββββββββββββββββββββ
systemd Integration
systemd
β
βββ Starts preheat at boot
βββ Restarts on failure
βββ Manages PID file
βββ Handles signals (reload, stop)
What Happens When You Launch an App
Without Preheat
1. User clicks Firefox
2. Shell calls exec("/usr/bin/firefox")
3. Kernel checks page cache β MISS
4. Kernel reads firefox binary from disk β SLOW
5. Kernel loads shared libraries from disk β SLOW
6. Firefox initializes β Application ready
With Preheat (Predicted)
[Earlier: Preheat predicted Firefox and preloaded it]
1. User clicks Firefox
2. Shell calls exec("/usr/bin/firefox")
3. Kernel checks page cache β HIT!
4. Firefox binary already in memory β FAST
5. Shared libraries already cached β FAST
6. Firefox initializes β Application ready
Session-Aware Boot Preloading
Preheat detects when you log in and enters an aggressive preload mode:
Login detected (/run/user/$UID exists)
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β 3-MINUTE BOOT WINDOW β
β β
β β’ Top 5 most-used apps boosted β
β β’ Immediately scheduled for β
β preloading (lnprob = -15.0) β
β β’ Only if β₯20% memory available β
β β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
Normal prediction resumes
This ensures your daily applications are warm in cache the moment you need them.
Smart First-Run Seeding
On first startup, preheat doesnβt start with an empty model. It scans your system for usage hints:
| Source | What it finds | Filtering |
|---|---|---|
| XDG Recently-Used | Files youβve opened recently | GUI apps only |
| Desktop file timestamps | Apps with recent .desktop access |
Excludes shell wrappers |
| Shell history | Commands from bash/zsh history | Only apps with .desktop files |
| Browser profiles | Installed Firefox, Chrome, etc. | None |
| System defaults | DE-specific apps (GNOME, KDE) | None |
Filtering (v1.0+):
- Shell history only seeds apps that have a
.desktopfile (excludesgrep,ls, etc.) - Desktop seeding skips launcher scripts like Kaliβs
exec-in-shellwrapper - Prevents shell wrappers from dominating the βtop appsβ list
This provides immediate preloading benefit from day oneβno waiting for the learning period.
Summary
| Phase | What Happens | Linux Interface |
|---|---|---|
| Scan | Find running processes | /proc filesystem |
| Learn | Update Markov model | In-memory data structures |
| Predict | Score applications | Probability calculations |
| Preload | Read files into cache | readahead(2) syscall |
Preheatβs effectiveness comes from:
- Accurate prediction based on learned patterns
- Efficient I/O through sorting and parallelism
- Careful memory management
- Non-intrusive background operation
Navigation
| Previous | Up | Next |
|---|---|---|
| β Quick Start | Documentation Index | Architecture β |