πŸ”₯ Preheat

Predict. Preload. Perform.

Adaptive readahead daemon for Linux

View on GitHub

How Preheat Works

This document explains the operational principles of preheatβ€”how it monitors, learns, predicts, and preloads applications.


High-Level Overview

Preheat operates in a continuous cycle:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     PREHEAT OPERATION CYCLE                      β”‚
β”‚                                                                  β”‚
β”‚    Every 20 seconds (configurable):                              β”‚
β”‚                                                                  β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚    β”‚  SCAN    β”‚ -> β”‚  LEARN   β”‚ -> β”‚ PREDICT  β”‚ -> β”‚ PRELOAD  β”‚  β”‚
β”‚    β”‚          β”‚    β”‚          β”‚    β”‚          β”‚    β”‚          β”‚  β”‚
β”‚    β”‚ Check    β”‚    β”‚ Update   β”‚    β”‚ Score    β”‚    β”‚ Read     β”‚  β”‚
β”‚    β”‚ /proc    β”‚    β”‚ Markov   β”‚    β”‚ each     β”‚    β”‚ files    β”‚  β”‚
β”‚    β”‚ for      β”‚    β”‚ chain    β”‚    β”‚ app's    β”‚    β”‚ into     β”‚  β”‚
β”‚    β”‚ running  β”‚    β”‚ model    β”‚    β”‚ launch   β”‚    β”‚ memory   β”‚  β”‚
β”‚    β”‚ apps     β”‚    β”‚          β”‚    β”‚ prob.    β”‚    β”‚ cache    β”‚  β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                                               β”‚        β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚                          (repeat)                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Four Phases

Phase 1: Scan

Preheat monitors running processes by scanning the /proc filesystem:

/proc/
β”œβ”€β”€ 1234/                    # Process with PID 1234
β”‚   β”œβ”€β”€ exe -> /usr/bin/firefox    # Executable path
β”‚   └── maps                 # Memory mappings (libraries)
β”œβ”€β”€ 5678/
β”‚   β”œβ”€β”€ exe -> /usr/bin/code
β”‚   └── maps
└── ...

What it collects:

Filtering rules:

Phase 2: Learn

The daemon builds a statistical model of application co-occurrence:

Markov Chain Model:

A Markov chain tracks which applications tend to run together or in sequence:

Example: User often opens Firefox, then VS Code, then Terminal

State transitions recorded:
  Firefox  -->  VS Code    (probability: 0.6)
  Firefox  -->  Terminal   (probability: 0.3)
  Firefox  -->  Other      (probability: 0.1)
  
  VS Code  -->  Terminal   (probability: 0.7)
  VS Code  -->  Firefox    (probability: 0.2)
  ...

Learning over time:

Weighted Launch Counting (v1.0.0+):

Preheat uses sophisticated launch counting to accurately track application usage:

Launch Weight Formula:
  weight = base Γ— log(1 + duration/divisor) Γ— user_mult Γ— short_penalty

Where:
  base = 1.0
  divisor = 60 seconds (configurable)
  user_mult = 2.0 for user-initiated, 1.0 for automated (configurable)
  short_penalty = 0.3 if duration < 5 seconds, 1.0 otherwise

Progressive Weight Accumulation:

Application launched:
  t=0s    β†’ raw_launches++ (immediate count)
  t=20s   β†’ weighted_launches += 0.8 (first cycle)
  t=40s   β†’ weighted_launches += 0.6 (second cycle)  
  t=60s   β†’ weighted_launches += 0.5 (third cycle)
  ...     β†’ continues until exit

Example comparison:

Scenario Duration User? Weight Formula Final Weight
Grep (automated) 0.1s No 1.0 Γ— log(1.002) Γ— 1.0 Γ— 0.3 ~0.001
Failed launch (crash) 2s Yes 1.0 Γ— log(1.03) Γ— 2.0 Γ— 0.3 ~0.02
Terminal session 600s Yes 1.0 Γ— log(11) Γ— 2.0 Γ— 1.0 ~4.8
Firefox (long session) 7200s Yes 1.0 Γ— log(121) Γ— 2.0 Γ— 1.0 ~9.8

This prevents crashes and trivial processes from inflating prediction scores.

Phase 3: Predict

Using the learned model, preheat calculates which applications are most likely to be launched:

Scoring factors:

Factor Weight Description
Markov probability High Based on currently running apps
Correlation coefficient Medium Statistical co-occurrence strength
Recency Medium Recently used apps score higher
Frequency Low Overall launch count

Example prediction:

Currently running: Firefox, Terminal

Predictions:
  1. VS Code        (score: 0.82)  - Often follows Firefox
  2. File Manager   (score: 0.65)  - Frequently co-occurs
  3. Spotify        (score: 0.41)  - Sometimes used together
  4. LibreOffice    (score: 0.23)  - Rare combination

Phase 4: Preload

High-scoring applications are preloaded into the disk cache:

Application: VS Code
Files to preload:
  /usr/share/code/code           (main binary, 120 MB)
  /usr/lib/x86_64-linux-gnu/libnode.so
  /usr/lib/x86_64-linux-gnu/libv8.so
  ... (shared libraries)

Preloading mechanism:

  1. Check available memory (respect memory limits)
  2. Get list of files for predicted applications
  3. Sort files for optimal I/O (by disk block for HDDs)
  4. Call readahead(2) system call on each file
  5. Kernel reads file data into disk cache

The readahead(2) System Call

The core of preloading is the Linux readahead() system call:

readahead(fd, offset, count);

What it does:

Why it’s efficient:


Memory Management

Preheat carefully manages how much memory it uses for preloading:

Memory Budget Calculation

Available for preloading = max(0, Total Γ— memtotal% + Free Γ— memfree%) 
                          + Cached Γ— memcached%

With defaults (memtotal=-10, memfree=50, memcached=0):

Example

System: 8 GB total RAM, 3 GB free, 2 GB cached

Calculation:
  Total contribution:  8192 MB Γ— (-10%) = -819 MB
  Free contribution:   3072 MB Γ— (50%)  = 1536 MB
  Cached contribution: 2048 MB Γ— (0%)   =    0 MB
  
  Available = max(0, -819 + 1536) + 0 = 717 MB

Preheat will preload up to 717 MB of application data.

Memory Pressure Response

When system memory becomes scarce:


I/O Optimization

Sorting Strategies

How files are ordered for preloading affects disk efficiency:

Strategy Best For Description
0 - None SSD, Flash No sorting (random access is fast)
1 - Path Network FS Group by directory path
2 - Inode Most FS Sort by inode number
3 - Block HDD Sort by physical disk block

Block sorting (default, strategy 3):

Before sorting:        After sorting:
  File A (block 500)     File C (block 100)
  File C (block 100)     File A (block 500)
  File B (block 450)     File B (block 450)
  
Result: Disk head moves in one direction, minimizing seeks

Parallel Readahead

Multiple files can be read simultaneously using worker processes:

Main daemon
    β”‚
    β”œβ”€β”€ Worker 1: Reading /usr/bin/firefox
    β”œβ”€β”€ Worker 2: Reading /usr/lib/libgtk.so
    β”œβ”€β”€ Worker 3: Reading /usr/share/code/code
    └── ... (up to 30 by default)

This utilizes disk queue depth and parallelism for faster preloading.


Timing and Scheduling

The Cycle Timer

Time:   0s      20s     40s     60s     ...
        β”‚       β”‚       β”‚       β”‚
        β–Ό       β–Ό       β–Ό       β–Ό
      Scan    Scan    Scan    Scan
      Learn   Learn   Learn   Learn
      Predict Predict Predict Predict
      Preload Preload Preload Preload

Each cycle (default: 20 seconds):

  1. Completes within a few hundred milliseconds
  2. Most time is spent waiting for I/O
  3. CPU usage during scan: typically <1%

Nice Level

Preheat runs with elevated nice level (default: 15):


State Persistence

What’s Saved

The state file contains:

Save Triggers

  1. Autosave timer: Every hour by default
  2. Graceful shutdown: On SIGTERM
  3. Manual save: Via preheat-ctl save or SIGUSR2

State File Location

/usr/local/var/lib/preheat/preheat.state    # Binary format

On startup:


Interaction with Linux Subsystems

/proc Filesystem

Preheat reads:
  /proc/[pid]/exe      # Symlink to executable
  /proc/[pid]/maps     # Memory mappings
  /proc/[pid]/stat     # Process statistics
  /proc/meminfo        # System memory status

Page Cache (Disk Cache)

Kernel Page Cache
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”   β”‚
β”‚ β”‚ Page β”‚ β”‚ Page β”‚ β”‚ Page β”‚ β”‚ Page β”‚...β”‚
β”‚ β”‚(file)β”‚ β”‚(file)β”‚ β”‚(anon)β”‚ β”‚(file)β”‚   β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                       β”‚
β”‚ Preheat adds file pages via readahead β”‚
β”‚ Kernel manages eviction automatically β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

systemd Integration

systemd
    β”‚
    β”œβ”€β”€ Starts preheat at boot
    β”œβ”€β”€ Restarts on failure
    β”œβ”€β”€ Manages PID file
    └── Handles signals (reload, stop)

What Happens When You Launch an App

Without Preheat

1. User clicks Firefox
2. Shell calls exec("/usr/bin/firefox")
3. Kernel checks page cache β†’ MISS
4. Kernel reads firefox binary from disk β†’ SLOW
5. Kernel loads shared libraries from disk β†’ SLOW
6. Firefox initializes β†’ Application ready

With Preheat (Predicted)

[Earlier: Preheat predicted Firefox and preloaded it]

1. User clicks Firefox
2. Shell calls exec("/usr/bin/firefox")
3. Kernel checks page cache β†’ HIT!
4. Firefox binary already in memory β†’ FAST
5. Shared libraries already cached β†’ FAST
6. Firefox initializes β†’ Application ready

Session-Aware Boot Preloading

Preheat detects when you log in and enters an aggressive preload mode:

Login detected (/run/user/$UID exists)
                 β”‚
                 β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      3-MINUTE BOOT WINDOW           β”‚
β”‚                                     β”‚
β”‚  β€’ Top 5 most-used apps boosted     β”‚
β”‚  β€’ Immediately scheduled for        β”‚
β”‚    preloading (lnprob = -15.0)      β”‚
β”‚  β€’ Only if β‰₯20% memory available    β”‚
β”‚                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β–Ό
       Normal prediction resumes

This ensures your daily applications are warm in cache the moment you need them.


Smart First-Run Seeding

On first startup, preheat doesn’t start with an empty model. It scans your system for usage hints:

Source What it finds Filtering
XDG Recently-Used Files you’ve opened recently GUI apps only
Desktop file timestamps Apps with recent .desktop access Excludes shell wrappers
Shell history Commands from bash/zsh history Only apps with .desktop files
Browser profiles Installed Firefox, Chrome, etc. None
System defaults DE-specific apps (GNOME, KDE) None

Filtering (v1.0+):

This provides immediate preloading benefit from day oneβ€”no waiting for the learning period.


Summary

Phase What Happens Linux Interface
Scan Find running processes /proc filesystem
Learn Update Markov model In-memory data structures
Predict Score applications Probability calculations
Preload Read files into cache readahead(2) syscall

Preheat’s effectiveness comes from:

  1. Accurate prediction based on learned patterns
  2. Efficient I/O through sorting and parallelism
  3. Careful memory management
  4. Non-intrusive background operation

Previous Up Next
← Quick Start Documentation Index Architecture β†’