Mav Chatbot
• Developed in 24 hours during UTA’s “Model Mash: LLM Arena” Datathon, this project aimed to build a smart chatbot capable of answering real-world questions using university-specific data.
• Built a domain-specific chatbot for the University of Texas at Arlington using LLMs and Retrieval-Augmented Generation (RAG) to answer user queries based on live university data.
• Designed a hybrid retrieval system by combining Google Custom Search API and asynchronous scraping (aiohttp + BeautifulSoup) for efficient and deep content retrieval.
• Created a Python-based web crawler that starts from hardcoded root URLs and discovers detailed academic and administrative content across UTA’s domain.
• Utilized ChromaDB as an in-memory vector store to pre-index critical UTA pages, ensuring instant query response while avoiding SQLite-based deployment errors.
• Integrated Groq’s LLaMA 3–70B model for fast, high-quality natural language responses, managed through a structured LangChain RAG pipeline.
• Applied SentenceTransformers for embedding generation and used LangChain’s document splitter for chunking scraped content into semantically relevant sections.
• Built a user-friendly interface using Streamlit with session persistence, real-time feedback, and custom chat layout for a smooth conversational experience.
• Deployed the complete system on Streamlit Cloud using GitHub integration and secure environment variables for continuous delivery and testing.