You can build a chatbot on your own text data without an API by using retrieval + a local LLM. The common approach is RAG (Retrieval-Augmented Generation):
1. Embed your text using a model like `all-mpnet-base-v2`.
2. Store embeddings in FAISS or Chroma so you can search relevant chunks quickly.
3. On each query, find the top-matching chunks and feed them into a local instruction-tuned model (e.g. Falcon, Mistral, LLaMA from HuggingFace `transformers`).
You can then pass the retrieved text into a local LLM pipeline from HuggingFace to generate the final answer. For images, caption them with BLIP/CLIP so the paths are also searchable.