Research Overview

Focus areas

We build our generative models using a technology called deep learning, which leverages large amounts of data to train an AI system to perform a task.

Text

Our text models are advanced language processing tools that can generate, classify, and summarize text with high levels of coherence and accuracy.

Aligning language models to follow instructions

We’ve trained language models that are much better at following user intentions than GPT-3.

Summarizing books with human feedback

We've trained a model to summarize entire books with human feedback.

Language models are few-shot learners

We trained GPT-3, an autoregressive language model with 175 billion parameters.

Image

Our research on generative modeling for images has led to representation models like CLIP, which makes a map between text and images that an AI can read, and DALL-E, a tool for creating vivid images from text descriptions.

Hierarchical text-conditional image generation with CLIP latents

We’ve trained language models that are much better at following user intentions than GPT-3.

DALL·E: Creating images from text

We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language.

CLIP: Connecting text and images

We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision.

Audio

Our research on applying AI to audio processing and audio generation has led to developments in automatic speech recognition and original musical compositions.