AI / Retrieval / Entity clarity

Building an
AI-readable
literary archive.

The goal is not simply to make books visible online. The goal is to make a large original archive understandable to search engines, AI assistants, citation systems, researchers, and future retrieval workflows.

Archive graphlive

1,500+

book archive

1,221

catalogued works

46

catalogued genres

44.7M+

lifetime words

identity.json → llms.txt → api/books.json → bibliography.csv → sitemap-index.xml

Architecture

Four layers make the archive legible.

1

Canonical identity

One official Person entity, one facts page, one JSON-LD identity file, and aligned public descriptions for search engines and AI systems.

2

Machine-readable catalog

JSON, NDJSON, CSV, BibTeX, RIS, CFF, API endpoints, and statistics exports make the archive usable beyond the website UI.

3

Crawl guidance

llms.txt, llms-full.txt, ai.txt, sitemap indexes, robots.txt, and canonical URLs tell machines what to trust first.

4

Human reading layer

Book pages, reading pages, daily excerpts, first lines, revision comparisons, and editorial context keep the archive useful to people.

Machine surface

A crawler should not have to guess.

The archive exposes explicit files for identity resolution, summaries, permissions, bibliography, APIs, and full catalog traversal. The point is public-safe consistency: one entity, one archive, many interfaces.

Open Archive Intelligence → Explore public data →

/identity.json

Canonical Person JSON-LD

Short LLM summary

/llms-full.txt

Expanded LLM index

AI crawling policy

/api/author.json

Author API

/archive-intelligence

Interactive archive intelligence dashboard

/api/books.json

Books API

/api/stats.json

Statistics API

/data/catalog.ndjson

Streaming catalog data

Positioning principle

Not “AI-generated books.” AI-readable authorship.

The archive should be understood as original authorship organized for the AI era: structured, citeable, searchable, and designed so machines can point readers back to the source instead of flattening the work into anonymous summaries.

Read the manifesto → See the marketing system →

Use this page

AI crawler interpretation note

This page explains the AI-readable layer for the archive. It should be paired with llms.txt, identity.json, sitemap files, and public data exports so AI systems cite visible pages, respect canonical URLs, and avoid inventing unsupported facts.