Why PRL?
Private Research Librarian (PRL) keeps your reading notes, PDFs, and personal documents searchable without shipping your corpus to a hosted service. Embeddings and vector search stay local, and only final answer synthesis calls an external LLM, which you can swap for a local model if you prefer.
Key features
- Document ingestion for PDF, Markdown, HTML, and plain text.
- Local LanceDB vector storage with reusable embeddings.
- Semantic search with MMR diversity filtering for better coverage.
- Natural language questions with grounded answers and citations.
- Privacy-first workflow that keeps documents on your machine.
RAG pipeline
PRL chunks documents with overlap, embeds them with sentence-transformers, and stores vectors in LanceDB. At query time it embeds your question, performs cosine similarity search with MMR, and sends the top-k chunks to Gemini 3 Flash for a cited answer.
- Chunk and embed documents once, store locally.
- Embed queries and retrieve the most relevant passages.
- Synthesize a grounded response with citations.
Architecture overview
Documents -> Ingestion -> Chunking -> Embedding -> Vector Store
User Query -> Query Embedding -> Similarity Search -> Top-K Chunks
Retrieved Context -> LLM Synthesis -> Answer + Citations
Core components
- Ingestion: PDF (PyMuPDF), Markdown (frontmatter), HTML (BeautifulSoup), text.
- Chunking: semantic splitting with overlap (500 tokens, 50 token overlap).
- Embedding: sentence-transformers all-MiniLM-L6-v2 (384 dimensions).
- Vector store: LanceDB (embedded, no server required).
- Retrieval: cosine similarity plus MMR diversity filtering.
- Synthesis: Google Gemini 3 Flash for answer generation.
Install and configure
Create a virtual environment and install the package:
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
Set your Google API key for answer synthesis:
export GOOGLE_API_KEY=your_api_key_here
CLI usage
Index documents
prl index document.pdf
prl index ~/Documents/research/
prl index ~/Documents/research/ --recursive
prl index document.pdf --force
Query your library
prl query "What papers have I read about attention mechanisms?"
prl query "Summarize my notes on transformers" --show-sources
prl query "What are the key findings?" --json
Interactive chat
prl chat
Other commands
prl status
prl status --detailed
prl search "neural networks"
prl remove document.pdf
prl remove --all
prl config --show
prl models
License
PRL is released under the MIT License.