Mav Chatbot

• Developed in 24 hours during UTA’s “Model Mash: LLM Arena” Datathon, this project aimed to build a smart chatbot capable of answering real-world questions using university-specific data.

• Built a domain-specific chatbot for the University of Texas at Arlington using LLMs and Retrieval-Augmented Generation (RAG) to answer user queries based on live university data.

• Designed a hybrid retrieval system by combining Google Custom Search API and asynchronous scraping (aiohttp + BeautifulSoup) for efficient and deep content retrieval.

• Created a Python-based web crawler that starts from hardcoded root URLs and discovers detailed academic and administrative content across UTA’s domain.

• Utilized ChromaDB as an in-memory vector store to pre-index critical UTA pages, ensuring instant query response while avoiding SQLite-based deployment errors.

• Integrated Groq’s LLaMA 3–70B model for fast, high-quality natural language responses, managed through a structured LangChain RAG pipeline.

• Applied SentenceTransformers for embedding generation and used LangChain’s document splitter for chunking scraped content into semantically relevant sections.

• Built a user-friendly interface using Streamlit with session persistence, real-time feedback, and custom chat layout for a smooth conversational experience.

• Deployed the complete system on Streamlit Cloud using GitHub integration and secure environment variables for continuous delivery and testing.