# Home [![Documentation](https://img.shields.io/badge/docs-github%20pages-blue)](https://gperdrizet.github.io/llms-demo/) ![Python](https://img.shields.io/badge/Python-3.11-3776AB?logo=python&logoColor=white) ![LangChain](https://img.shields.io/badge/LangChain-0.3-1C3C3C?logo=langchain&logoColor=white) ![HuggingFace](https://img.shields.io/badge/HuggingFace-Transformers-FFD21E?logo=huggingface&logoColor=black) ![Gradio](https://img.shields.io/badge/Gradio-UI-FF7C00?logo=gradio&logoColor=white) ![Ollama](https://img.shields.io/badge/Ollama-local%20LLM-black?logo=ollama&logoColor=white) ![llama.cpp](https://img.shields.io/badge/llama.cpp-GGUF-green) ![PostgreSQL](https://img.shields.io/badge/pgvector-PostgreSQL-4169E1?logo=postgresql&logoColor=white) ![ChromaDB](https://img.shields.io/badge/ChromaDB-vector%20store-E8572A) Example code demonstrating local LLM inference with various backends and libraries. ```{toctree} :maxdepth: 2 :caption: Contents quickstart slides demos activities inference_servers libraries models systemd-deployment ``` ## Overview This repository contains chatbot demos and hands-on activities for learning prompt engineering and local LLM deployment. ### Demos **Chatbots** (`demos/chatbots/`) - **HuggingFace chatbot**: Direct model loading with Transformers — no inference server needed - **Ollama chatbot**: Terminal chatbot using LangChain + a local Ollama server - **llama.cpp chatbot**: Terminal chatbot using the OpenAI-compatible llama.cpp API - **Gradio chatbot**: Web UI with switchable Ollama / llama.cpp backends and customizable system prompt **LangChain patterns** (`demos/langchain_patterns/`) - **LangChain demo**: Prompt templates, output parsers, LCEL chains, and few-shot learning - **ReAct agent**: LangChain agent with custom tools and multi-step reasoning (two versions: framework and manual) **RAG system** (`demos/rag_system/`) - **RAG demo**: Ingest Wikipedia articles into a pgvector knowledge base and query them with a grounded LLM ### Infrastructure - **Inference servers**: [Ollama](inference_servers.md) (lightweight), [llama.cpp](inference_servers.md) (high-performance MoE) - **Libraries**: [Transformers](libraries.md), [LangChain](libraries.md), [Gradio](libraries.md) - **Models**: [GPT-OSS-120B](models.md) (120B MoE), [GPT-OSS-20B](models.md) (21B), [Qwen3.5-35B-A3B](models.md) (35B MoE with vision) — any GGUF-compatible model can be added by downloading the weights and pointing the server at them ### Get started See the [Quickstart](quickstart.md) guide for installation and setup, then explore the [Demos](demos.md) to learn about different inference approaches. --- ## Links - [GitHub repository](https://github.com/gperdrizet/llms-demo) - [Docker image](https://hub.docker.com/repository/docker/gperdrizet/llms-gpu/general)