Skip to main content
Extractors are the components that handle site-specific logic for extracting video information from URLs. yt-dlp includes over 1,700 extractors for different websites.

What is an Extractor?

An extractor is a class that:
  • Determines if it can handle a given URL
  • Extracts video metadata (title, description, formats, etc.)
  • Provides download URLs for the video content
Each extractor inherits from the InfoExtractor base class.

Using Extractors

List Available Extractors

import yt_dlp

# Get all extractor classes
extractors = yt_dlp.gen_extractor_classes()

for extractor in extractors:
    print(f"{extractor.IE_NAME}: {extractor.IE_DESC}")

Get a Specific Extractor

import yt_dlp

# Get YouTube extractor
youtube_ie = yt_dlp.get_info_extractor('Youtube')
print(f"Extractor name: {youtube_ie.IE_NAME}")
print(f"Description: {youtube_ie.IE_DESC}")

Force a Specific Extractor

import yt_dlp

with yt_dlp.YoutubeDL({}) as ydl:
    # Force using the Youtube extractor
    info = ydl.extract_info(
        'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
        ie_key='Youtube',
        download=False
    )

InfoExtractor Base Class

All extractors inherit from the InfoExtractor class, which provides common functionality.

Key Attributes

IE_NAME
string
The extractor’s unique identifier (e.g., ‘youtube’, ‘vimeo’)
IE_DESC
string
Human-readable description of the extractor
_VALID_URL
string
Regular expression pattern matching URLs this extractor can handle
_WORKING
boolean
default:"true"
Whether the extractor is currently working
age_limit
int
Age limit for content from this extractor

Common Methods

suitable()

Check if the extractor can handle a URL.
if extractor.suitable(url):
    info = extractor.extract(url)

working()

Check if the extractor is currently working.
if extractor.working():
    # Use the extractor
    pass
else:
    print("Extractor is marked as broken")

Information Dictionary Format

Extractors return information dictionaries with standardized fields:

Required Fields

id
string
required
Unique video identifier
title
string
required
Video title (empty string if unavailable, not None)

Format Information

formats
list[dict]
List of available formats, ordered from worst to best quality. Each format dict contains:
  • url: Media URL
  • format_id: Format identifier
  • ext: File extension
  • width, height: Video dimensions
  • tbr, abr, vbr: Bitrates
  • acodec, vcodec: Codec names
  • filesize: File size in bytes

Optional Metadata Fields

description
string
Video description
uploader
string
Name of the video uploader
uploader_id
string
Uploader’s unique identifier
uploader_url
string
URL to uploader’s profile
duration
int
Video duration in seconds
view_count
int
Number of views
like_count
int
Number of likes
upload_date
string
Upload date in YYYYMMDD format
timestamp
int
Unix timestamp of upload time
thumbnails
list[dict]
List of thumbnail dictionaries with ‘url’, ‘width’, ‘height’
subtitles
dict
Dictionary mapping language codes to lists of subtitle format dicts
categories
list[string]
List of video categories
tags
list[string]
List of tags

Working with Playlists

Some extractors handle playlists or channels. These return a different structure:
import yt_dlp

with yt_dlp.YoutubeDL({'extract_flat': 'in_playlist'}) as ydl:
    info = ydl.extract_info(
        'https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf',
        download=False
    )
    
    print(f"Playlist: {info['title']}")
    print(f"Videos: {len(info['entries'])}")
    
    for entry in info['entries']:
        print(f"  - {entry['title']}")
_type
string
Type of result: ‘video’, ‘playlist’, ‘multi_video’, or ‘url’
entries
list[dict]
For playlists: list of video info dictionaries
playlist_title
string
Title of the playlist
playlist_count
int
Total number of videos in playlist

Extractor Arguments

Some extractors accept additional arguments to customize their behavior:
import yt_dlp

ydl_opts = {
    'extractor_args': {
        'youtube': {
            'skip': ['dash', 'hls'],  # Skip DASH and HLS formats
            'player_client': ['android'],  # Use Android client
        },
        'generic': {
            'timeout': ['30'],  # Custom timeout
        }
    }
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    ydl.download(['https://www.youtube.com/watch?v=dQw4w9WgXcQ'])
Extractor arguments must always be lists of strings, even for single values.

Common Extractor Examples

YouTube

import yt_dlp

ydl_opts = {
    'format': 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]',
    'extractor_args': {
        'youtube': {
            'skip': ['dash'],  # Skip DASH formats
        }
    }
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    info = ydl.extract_info('https://www.youtube.com/watch?v=dQw4w9WgXcQ', download=False)
    print(f"Channel: {info['channel']}")
    print(f"Subscribers: {info.get('channel_follower_count', 'N/A')}")

Generic Extractor

The generic extractor can handle many sites by detecting embedded videos:
import yt_dlp

with yt_dlp.YoutubeDL({}) as ydl:
    # Will use generic extractor for unsupported sites
    info = ydl.extract_info('https://example.com/some-video', download=False)

List Supported Sites

import yt_dlp

extractors = yt_dlp.list_extractor_classes()
print(f"Total extractors: {len(list(extractors))}")

# Filter to working extractors only
working = [ie for ie in yt_dlp.gen_extractor_classes() if ie.working()]
print(f"Working extractors: {len(working)}")

Extractor Plugins

You can create custom extractors as plugins. Place them in:
  • ~/.yt-dlp/plugins/extractor/
  • ${XDG_CONFIG_HOME}/yt-dlp/plugins/extractor/
Example custom extractor:
# ~/.yt-dlp/plugins/extractor/mysite.py
from yt_dlp.extractor.common import InfoExtractor

class MySiteIE(InfoExtractor):
    _VALID_URL = r'https?://(?:www\.)?mysite\.com/video/(?P<id>[0-9]+)'
    
    def _real_extract(self, url):
        video_id = self._match_id(url)
        
        # Fetch and parse the webpage
        webpage = self._download_webpage(url, video_id)
        
        return {
            'id': video_id,
            'title': self._html_search_regex(r'<title>(.+?)</title>', webpage, 'title'),
            'url': self._html_search_regex(r'file: "(.+?)"', webpage, 'video url'),
        }

Handling Age-Restricted Content

import yt_dlp

ydl_opts = {
    'age_limit': 18,  # Allow content up to this age rating
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    # Will filter out content above age limit
    info = ydl.extract_info(url, download=False)

Retry Logic

Extractors support automatic retry on failures:
import yt_dlp

ydl_opts = {
    'extractor_retries': 3,  # Retry up to 3 times
    'retry_sleep_functions': {
        'extractor': lambda n: n * 2,  # Sleep 2n seconds between retries
    }
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    ydl.download([url])