Improving Performance of Local Chatbot with Caching

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Chatbots and the technology behind them are widely used in many places and in various ways. Retrieval Augmented Generation AI framework has gained its popularity by its linking of large language model with private dataset. It enables one to run AI locally and privately with the most updated information and knowledge. In this report, we aim to improve the local private chatbot response time by using a cache. From our experimental results, the majority of time spent during the query process is in the generation of the response. The response time can be significantly improved when there is a hit on the cache system which enables us to return the response to the user immediately without going through the generation step. In this report, we focus our efforts on improving the turnaround time of the generation step. The cache is organized into categories which can be used for efficient searching. User’s query information such as query string, embedding information, and its response are recorded and stored in the cache. Experiment results are presented and the issues of speed up of request response turnaround time is addressed.

Original languageEnglish
Title of host publicationWMSCI 2024 - 28th World Multi-Conference on Systemics, Cybernetics and Informatics, Proceedings
EditorsNagib C. Callaos, Elina Gaile-Sarkane, Natalja Lace, Belkis Sanchez, Michael Savoie
PublisherInternational Institute of Informatics and Cybernetics
Pages68-71
Number of pages4
ISBN (Electronic)9781950492794
DOIs
StatePublished - 2024
Event28th World Multi-Conference on Systemics, Cybernetics and Informatics, WMSCI 2024 - Virtual, Online
Duration: 10 Sep 202413 Sep 2024

Publication series

NameProceedings of World Multi-Conference on Systemics, Cybernetics and Informatics, WMSCI
Volume2024-September
ISSN (Print)2771-0947

Conference

Conference28th World Multi-Conference on Systemics, Cybernetics and Informatics, WMSCI 2024
CityVirtual, Online
Period10/09/2413/09/24

Keywords

  • Cache
  • Chatbot
  • Embeddings
  • LLM
  • RAG
  • Similarity Search

Fingerprint

Dive into the research topics of 'Improving Performance of Local Chatbot with Caching'. Together they form a unique fingerprint.

Cite this