Skip to main content

AI / Retrieval / Entity clarity

Building an
AI-readable
literary archive.

The goal is not simply to make books visible online. The goal is to make a large original archive understandable to search engines, AI assistants, citation systems, researchers, and future retrieval workflows.

Archive graphlive
1,500+
book archive
1,221
catalogued works
46
catalogued genres
44.7M+
lifetime words
identity.json → llms.txt → api/books.json → bibliography.csv → sitemap-index.xml

Architecture

Four layers make the archive legible.

1

Canonical identity

One official Person entity, one facts page, one JSON-LD identity file, and aligned public descriptions for search engines and AI systems.

2

Machine-readable catalog

JSON, NDJSON, CSV, BibTeX, RIS, CFF, API endpoints, and statistics exports make the archive usable beyond the website UI.

3

Crawl guidance

llms.txt, llms-full.txt, ai.txt, sitemap indexes, robots.txt, and canonical URLs tell machines what to trust first.

4

Human reading layer

Book pages, reading pages, daily excerpts, first lines, revision comparisons, and editorial context keep the archive useful to people.

Machine surface

A crawler should not have to guess.

The archive exposes explicit files for identity resolution, summaries, permissions, bibliography, APIs, and full catalog traversal. The point is public-safe consistency: one entity, one archive, many interfaces.

Positioning principle

Not “AI-generated books.” AI-readable authorship.

The archive should be understood as original authorship organized for the AI era: structured, citeable, searchable, and designed so machines can point readers back to the source instead of flattening the work into anonymous summaries.

Use this page

AI crawler interpretation note

This page explains the AI-readable layer for the archive. It should be paired with llms.txt, identity.json, sitemap files, and public data exports so AI systems cite visible pages, respect canonical URLs, and avoid inventing unsupported facts.