Demystifying the evolution of specialized GPT models: InstructGPT and ChatGPT

7 min readAug 18, 2023

In our previous blog, we talked about the BaseGPT models and their evolution. In this blog, we will look at the specialized versions of GPT (Generative Pre-trained Transformer) models. These versions are created from Base models by using task specific fine-tuning. There are two major specialized versions of GPT models namely InstructGPT and ChatGPT.

GPT-3 base models

Base models are large-scale language models that are pre-trained on massive amounts of text data using unsupervised learning techniques. These models form the foundational versions of the GPT series for more advanced iterations. These models are based on the Transformer architecture, which is a deep neural network architecture designed to process sequential data, such as text.

The base models are trained on large and diverse corpus of text data, such as books, articles, and web pages, using an unsupervised learning approach called “masked language modeling”. This involves masking out some of the words in a sentence and training the model to predict the masked words based on the surrounding context. The model learns to predict the next word in a sentence based on the context provided by the preceding words. This unsupervised pre-training process allows the model to develop a deep understanding of language structure, syntax, and semantics.

Following pre-training, the base models can fine-tune for specific tasks like sentiment analysis, text completion, and question-answering using labeled data. Fine-tuning refines the model’s skills, enhancing specialization and accuracy.

InstructGPT

InstructGPT, an extension of OpenAI’s GPT-3 model, excels as a unique language model adept at following instructions and completing diverse tasks. It trains on extensive datasets of instructions and tasks, swiftly grasping directives for efficient execution. The core purpose of InstructGPT is automating repetitive tasks for businesses. For instance, users can prompt in simple tasks like “compose a blog post on benefits of using InstructGPT” or “create a presentation on latest AI trends”, which are easily accomplished.

InstructGPT’s capabilities include data entry, cleaning, summarization, and more. Users can entrust InstructGPT with diverse tasks like “extracting contact information from a list of customers” or “summarizing a research paper”. Notably, InstructGPT shines in its adaptability to custom datasets, allowing training with specific instructions and tasks tailored to unique requirements.

ChatGPT

ChatGPT is a large language model chatbot developed by OpenAI. It utilizes GPT-3.5 and GPT-4 foundational LLMs, and fine-tuned via supervised and reinforcement learning techniques. ChatGPT can engage in seamless dialogues with humans, comprehending their intent and delivering informative, engaging responses. It has emerged to be an invaluable tool for a wide array of businesses with its ability to perform diverse tasks ranging from customer service or education to marketing and more.

Creation of InstructGPT and ChatGPT models

GPT-3 has 4 base versions out of which Davinci is the most powerful. The Davinci version is used to create Codex Initial and InstructGPT initial.

Source: University of Edinburgh, Allen Institute for AI

Step 1– Codex Initial is a language model that is specifically designed to understand and generate code. Trained on a vast amount of publicly available code on the internet code, it excels in language understanding and snippet generation. Codex-initial has two variants: Code-davinci-001 and code-cushman-001. The initial phase involves pre-training on a large corpus of code to comprehend syntax, semantics, and patterns. Post pre-training, Codex undergoes a fine-tuning process with a curated dataset that includes demonstrations and comparisons of code snippets. This process aligns the code generation capabilities of the code with specific programming tasks, thus ensuring accuracy and reliability.

Step 1.1– InstructGPT Initial model is created by fine-tuning the GPT-3 base model. It has two variants namely Instruct Davinci beta and text Davinci-001. The base language model is fine-tuned using a specialized dataset specifically curated for natural language programming (NLP) tasks. This dataset maps programming instructions with code examples. The model learns to generate code aligned with instructions, refining its ability for code snippets. By fine-tuning the pre-trained base model on a tailored NLP dataset, InstructGPT is able to better understand and generate code from human-like instructions.

Step 2– Code-Davinci-002 results from combining InstructGPT initial and Codex initial models. This synergy forms a powerful model, tailored to programming tasks that can produce high-quality code outputs.

Step 3– Text-Davinci-002 is an InstructGPT model. To create Text-Davinci-002, OpenAI fine-tuned pre-trained Code-Davinci-002 with non-programming text, such as news articles, books, and other general text. It becomes an InstructGPT model.

Step 4– Text-davinci-003 emerges from text-davinci-002 via reinforcement learning with human feedback. It undergoes extended training on a larger text and code dataset, featuring recent data as well as diverse, and complex examples. The model is fine-tuned to become more specific and concise in its responses. [Text-davinci-001,code-davinci-002 ,Text-davinci-002, Text-davinci-003 all are InstructGPT models, all are available for use through API except code-davinci-002].

Step 5– ChatGPT model was created by employing reinforcement learning with human feedback on the pre-trained Text-Davinci-002 model. It was then fine-tuned with a large conversational dataset, such as online chat logs, customer support conversations, and social media interactions.

Comparative overview of GPT-3 Base, InstructGPT and ChatGPT models

GPT-3 Base, InstructGPT and ChatGPT models in action: Outputs generated for different tasks

Task 1– Summarizing tweets

Analysis: The table above shows the GPT-3 base model fails to summarize the tweets, while InstructGPT and ChatGPT models perform better. The summary generated by the InstructGPT model is informative and precise, but the ChatGPT summary is more elaborate. For example, ChatGPT model also mentions competitor smartphones — a detail absent in InstructGPT’s summary. Both models provide strong summaries, with ChatGPT having a slight edge.

Task 2– Querying/Answering questions

Analysis: In this task, the BaseGPT model falls short in answering all questions, whereas both InstructGPT and ChatGPT models provide accurate answers to every question.

Task 3– Topic Creation

Analysis: The table above illustrates that the Base model struggles to generate topics, whereas the InstructGPT and ChatGPT models perform the task better. The ChatGPT model produces a greater number of detailed topics whereas the InstructGPT model generates more high-level topics.

Limitations of GPT models

A. Limited Memory/Max tokens: Limited Context/Max Tokens: GPT models have a finite context window, potentially losing information from long texts (retaining context up to 4096 tokens).

B. Rate limits: Rate limits, in the context of GPT models, refer to the constraints placed on the number of requests or the amount of computational resources that can be used within a specific time period. There are two types of rate limits while using GPT models –

Requests per minute: Limited API requests per minute, can be increased with paid API. But these are still limited in the cases where a high number of requests are to be processed.
Tokens per minute: Limited tokens passed to API per minute, can be increased with paid API. But these are still limited in the cases where a huge volume of data is to be processed.

C. Lack of Reasoning: While GPT models excel in generating human-like text, but may struggle with common-sense reasoning, sometimes providing plausible yet unrealistic answers.

D. Safety and Ethical Concerns: GPT models may generate harmful, misleading, or malicious content if misused, requiring careful handling, especially in public applications.

E. Biases in Responses: GPT models might reflect biases present in the training data, yielding politically biased, offensive, or discriminatory responses. Efforts are needed to mitigate these biases, but they may still exist to some extent.

Conclusion

In general, specialized fine-tuned GPT versions outperform base models across tasks due to continuous training on quality datasets, enhancing their contextual understanding. InstructGPT and ChatGPT models perform similarly, with the ChatGPT model slightly edging ahead. This is likely due to its ability to generate outputs in conversational format. Notably, as prompts become more detailed, the outputs of both the models tend to become more similar in nature.

About the authors:

Ankit Mehra is a Senior Data Scientist at Sigmoid. He specializes in analytics and ML-based data solutions.
Malhar Yadav is an Associate Data Scientist at Sigmoid and a coding and ML enthusiast.
Bhaskar Ammu is a Senior Lead Data Scientist at Sigmoid. He specializes in designing data science solutions for clients, building database architectures, and managing projects and teams.