A civilizational project to digitize, verify, and preserve historical documents โ building the verified data foundation that AI desperately needs.
The Problem
Physical documents deteriorate. Libraries are underfunded. AI models are trained on unverified data and hallucinate as a result. TPN exists to fix all three โ simultaneously.
Newspapers, manuscripts, and books from the 19th and 20th centuries are decaying faster than institutions can preserve them.
Public libraries lack the resources to digitize their collections at scale. Community participation is the only viable solution.
Current AI models hallucinate because they are trained on unverified sources. Grounded, provenance-backed data is the cure.
The Solution
A physical page enters one end. A verified, searchable, hashed digital record exits the other.
Current Status
Built in public. Every milestone documented. Every component validated before moving forward.
Tesseract-powered text extraction achieving 94%+ confidence on historical documents.
Python pipeline normalizes OCR output, corrects errors, and standardizes formatting.
Every document receives a unique cryptographic fingerprint proving content integrity.
384-dimensional vectors enable natural language search across the entire archive.
FastAPI serving four endpoints โ process, retrieve, search, and health check.
First real phone scan from a Detroit library. Pipeline tested against genuine historical documents.