LLMs and Langchain — Manage issues in Github

5 min readOct 2, 2023

Generative AI, including Generative Adversarial Networks (GANs), is a branch of artificial intelligence focused on creating models that can generate new and realistic content.

GANs consist of two components: a generator that produces synthetic data, and a discriminator that distinguishes between real and generated data. Through an adversarial training process, GANs learn to generate outputs that closely resemble real data, enabling applications such as image synthesis, reply to questions, text generation, and video creation. Generative AI with GANs has revolutionized content creation and opened up new possibilities for creative applications.

What is LLM

LLM stands for Large Language Model, which is a type of generative AI model specifically designed for processing and generating human language. LLMs are trained on vast amounts of text data and can generate coherent and contextually relevant text based on given prompts or queries. They have applications in natural language processing, text generation, chatbots, and various other language-related tasks.

In this article we will dive in to to use LLM and Langchain to read the issue from Github repo, find an answer and reply as a comment to the issue.

Before we go further, let us see

What is Langchain?

Langchain is a framework designed to simplify the development of applications powered by large language models (LLMs). It provides a set of tools and abstractions that enable developers to build applications that leverage the capabilities of LLMs, such as text generation, summarization, question-answering, and more. Langchain abstracts the core building blocks of LLM applications, making it easier to integrate LLMs into data pipelines and build AI-driven language applications. It supports various LLM models, such as OpenAI and Hugging Face

Langchain provides several modules for developing applications powered by language models. These modules are:

Model I/O: This module allows you to interface with language models.
Retrieval: This module enables you to interface with application-specific data.
Chains: The Chains module allows you to construct sequences of calls, whether to a language model or a different utility.
Agents: The Agents module involves a language model making decisions about which actions to take, based on high-level directives.
Memory: The Memory module allows you to persist application state between runs of a chain.
Callbacks: The Callbacks module enables you to log and stream intermediate steps of any chain.

Langchain can be used in several areas in Data Science and Analytics to support various usecases

Prompt engineering plays critical role while developing apps using LLM and Langchain. Basically it is the practice of designing inputs for generative AI tools that will produce optimal outputs.

Two different input on color red giving two types of response based on context in the prompt

Example 1

Output :

Example 2

Output

Please note the code is written in python, using LLM.

Most of the time while we discuss about GenerativeAI and LLM, OpenAI comes to our mind. However to build applications using OpenAI , you need API key and it comes with cost. There are multiple other Open Source LLM Models which can found in Huggingface OpenLLM leaderboard

Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4

Discover amazing ML apps made by the community

huggingface.co

LLaMA: LLaMA is an open-source LLM that follows an instruction-following approach. It allows users to provide specific instructions to guide the model’s behavior. LLaMA models can be fine-tuned using different sets of instructions to suit specific use cases
GPT4All: GPT4All is an open-source LLM ecosystem that consists of chatbots trained on a wide range of clean assistant data, including code, stories, and dialogue. It offers conversational capabilities and can be integrated into various applications
GPT-3: GPT-3 is the successor to GPT-2 and is also developed by OpenAI. It is a highly advanced LLM with a massive number of parameters, enabling it to perform a wide range of natural language processing tasks, including text generation, translation, summarization, and more.
BERT: BERT (Bidirectional Encoder Representations from Transformers) is an open-source LLM developed by Google. It has been trained on a large amount of text data and is particularly effective for tasks such as text classification, named entity recognition, and question answering.
T5: T5 (Text-to-Text Transfer Transformer) is an open-source LLM developed by Google. It is a versatile model that can be fine-tuned for a wide range of natural language processing tasks, including text generation, translation, summarization, and more.

In this article we will be using LLAMA open source LLM to answer the queries in github issues. This will be done from local developer machine. To accomplish the same, we need Ollama (OLlama is an AI tool designed to help you run large language models locally). Download and install Ollama from https://ollama.ai/. Once done from command line run the below command

ollama run llama2

The above will download and run llama2 model locally, exposing it to http://localhost:11434/

Its time to read the github issue and reply a comment. We will be writing a piece of code in python 3.11 with langchain. It needs some pypackage to be installed as below.

pygithub
langchain

Before writing the code we need to create an app in github and get following details. Follow details here and ensure to give permissions to the app (Commit statuses (read only), Contents (read and write), Issues (read and write), Metadata (read only), Pull requests (read and write))

GITHUB_APP_ID- A six digit number found in your app’s general settings
GITHUB_APP_PRIVATE_KEY- The location of your app’s private key .pem file
GITHUB_REPOSITORY- The name of the Github repository you want your bot to act upon. Must follow the format {username}/{repo-name}. Make sure the app has been added to this repository first!
GITHUB_BRANCH- The branch where the bot will make its commits. Defaults to ‘master.’
GITHUB_BASE_BRANCH- The base branch of your repo, usually either ‘main’ or ‘master.’ This is where pull requests will base from. Defaults to ‘master.’

For testing I have created an issue in Github repo as below

Sockets in Java · Issue #2 · MeiyappanKannappa/langchain-test-repo

Write a code for socket programming in Java

github.com

Our Langchain code should read the issue and reply with a comment having an example.

We read the issue from github, input to llama model to get answer for the issue and write the comment in the above code. Running the code creates an answer in github issue

Note: we have not used the LLM with github agent toolkit initialized like the below code snippet, as the randomness is too high and doesnt work the way it is expected to work with Ollama.

References

https://python.langchain.com/docs/get_started/introduction