Selfhosted large language models with Ollama

From internet to large language models

The growth of internet users since 2000 created a huge quantity of data available. Search engines appear because of the difficulty to find the right information using powerful index, like a fuzzy search over a textual database and a score based on the cross-reference between websites.

The number of websites became more than a billion and result in a feeling of degradation in the result of search engines between the inactive websites (more 80%), malicious contents or behaviours, fake news business and social networks…

At the same time of the exponential growth of internet, computational power also increase which since 2010 made deep learning an available topic for most researchers without much invest, using GPUs from the gaming industries.

The most active research fields are on unsupervised and generatived training method like CNN, GAN and now Transformer but they need much more resource investment.

Ollama for the run

ChatGPT created an example of what can be done with large language models (LLM), taking a part of the users of search engines and social networks. Other companies, late compare OpenAI, create open models awailable and free to use locally. Even if inference is generaly fast, avaible models size about 5GB (can be 70GB), and running them on CPU feel slow.

Ollama is an server, written in Go, binding to llama.cpp to run model on different hardware accelerators. The user can easily pull and run avaiable LLM from repository like huggingface and many web, desktop and terminal clients are available to use.

Usefulness of current models

At the time of this article, Llama 3 was the most advance model available on Ollama.

Beginning with the good, the model learn language and known how to:

  • put one word with an other
  • structure inside and between sentences
  • complete an sentence
  • find the best word for a blank
  • translate between languages

But as users want more from them, they want to replace search engines and is presented like that, or replace as a coding helper. The pitfall is a response that feel right because of all the qualities present above.

But for me the problems are:

  • missing of reference or cross-reference like search engines
  • updating information mean retrain
  • versionning of library for coding
  • refusing to generate “I don’t know”

In the end, I could not make it essential for me. Some the problems are not about the modelisation but more about how it is used and presented.