Flagship Project

Truth Preservation Network

A civilizational project to digitize, verify, and preserve historical documents โ€” building the verified data foundation that AI desperately needs.

Partner With Us Follow the Journey

History Is Disappearing

Physical documents deteriorate. Libraries are underfunded. AI models are trained on unverified data and hallucinate as a result. TPN exists to fix all three โ€” simultaneously.

๐Ÿ“š

Documents Deteriorating

Newspapers, manuscripts, and books from the 19th and 20th centuries are decaying faster than institutions can preserve them.

๐Ÿ›๏ธ

Libraries Underfunded

Public libraries lack the resources to digitize their collections at scale. Community participation is the only viable solution.

๐Ÿค–

AI Needs Verified Data

Current AI models hallucinate because they are trained on unverified sources. Grounded, provenance-backed data is the cure.

The TPN Pipeline

A physical page enters one end. A verified, searchable, hashed digital record exits the other.

๐Ÿ“ฑ
Scan
Volunteer photographs document with phone
โ†’
๐Ÿ”ค
OCR
Tesseract extracts text at 94%+ confidence
โ†’
๐Ÿงน
Clean
Python normalizes and corrects OCR output
โ†’
๐Ÿ”
Hash
SHA-256 fingerprint proves content integrity
โ†’
๐Ÿ”
Search
Semantic API serves verified archive

Phase 1 โ€” Foundation Complete

Built in public. Every milestone documented. Every component validated before moving forward.

โœ… Complete

OCR Engine

Tesseract-powered text extraction achieving 94%+ confidence on historical documents.

โœ… Complete

Cleaning Engine

Python pipeline normalizes OCR output, corrects errors, and standardizes formatting.

โœ… Complete

SHA-256 Provenance

Every document receives a unique cryptographic fingerprint proving content integrity.

โœ… Complete

Semantic Embeddings

384-dimensional vectors enable natural language search across the entire archive.

โœ… Complete

Search API

FastAPI serving four endpoints โ€” process, retrieve, search, and health check.

โฌœ Week 3

Real World Validation

First real phone scan from a Detroit library. Pipeline tested against genuine historical documents.