It is NOT slow it appears to be slow
CLI spits output word by word immediately after hitting enter. In contrast, 'langchain' collects the entire output first, consuming 15-20 seconds, depending on the length of the response, and then spits out ... Boom... Even subprocess.run() has the same effect.
Workaround:
import os os.system('ollama run llama3.2:1b what is water short answer ') and then run the python script from the terminal: python main.py
Here, you can see output almost immediately as a stream.
Save the output in a text file that can be used in your Python script.
os.system('ollama run llama3.2:1b what is water short answer > output.txt')
to append the text file:
os.system('ollama run llama3.2:1b what is water short answer >> output.txt') I have posted this answer on GitHub as well