Extractors - yt-dlp

Extractors are the components that handle site-specific logic for extracting video information from URLs. yt-dlp includes over 1,700 extractors for different websites.

What is an Extractor?

An extractor is a class that:

Determines if it can handle a given URL
Extracts video metadata (title, description, formats, etc.)
Provides download URLs for the video content

Each extractor inherits from the InfoExtractor base class.

Using Extractors

List Available Extractors

import yt_dlp

# Get all extractor classes
extractors = yt_dlp.gen_extractor_classes()

for extractor in extractors:
    print(f"{extractor.IE_NAME}: {extractor.IE_DESC}")

Get a Specific Extractor

import yt_dlp

# Get YouTube extractor
youtube_ie = yt_dlp.get_info_extractor('Youtube')
print(f"Extractor name: {youtube_ie.IE_NAME}")
print(f"Description: {youtube_ie.IE_DESC}")

Force a Specific Extractor

import yt_dlp

with yt_dlp.YoutubeDL({}) as ydl:
    # Force using the Youtube extractor
    info = ydl.extract_info(
        'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
        ie_key='Youtube',
        download=False
    )

InfoExtractor Base Class

All extractors inherit from the InfoExtractor class, which provides common functionality.

Key Attributes

IE_NAME

string

The extractor’s unique identifier (e.g., ‘youtube’, ‘vimeo’)

IE_DESC

string

Human-readable description of the extractor

_VALID_URL

string

Regular expression pattern matching URLs this extractor can handle

_WORKING

boolean

default:"true"

Whether the extractor is currently working

age_limit

int

Age limit for content from this extractor

Common Methods

suitable()

Check if the extractor can handle a URL.

if extractor.suitable(url):
    info = extractor.extract(url)

working()

Check if the extractor is currently working.

if extractor.working():
    # Use the extractor
    pass
else:
    print("Extractor is marked as broken")

Information Dictionary Format

Extractors return information dictionaries with standardized fields:

Required Fields

string

required

Unique video identifier

title

string

required

Video title (empty string if unavailable, not None)

Format Information

formats

list[dict]

List of available formats, ordered from worst to best quality. Each format dict contains:

url: Media URL
format_id: Format identifier
ext: File extension
width, height: Video dimensions
tbr, abr, vbr: Bitrates
acodec, vcodec: Codec names
filesize: File size in bytes

Optional Metadata Fields

description

string

Video description

uploader

string

Name of the video uploader

uploader_id

string

Uploader’s unique identifier

uploader_url

string

URL to uploader’s profile

duration

int

Video duration in seconds

view_count

int

Number of views

like_count

int

Number of likes

upload_date

string

Upload date in YYYYMMDD format

timestamp

int

Unix timestamp of upload time

thumbnails

list[dict]

List of thumbnail dictionaries with ‘url’, ‘width’, ‘height’

subtitles

dict

Dictionary mapping language codes to lists of subtitle format dicts

Working with Playlists

Some extractors handle playlists or channels. These return a different structure:

import yt_dlp

with yt_dlp.YoutubeDL({'extract_flat': 'in_playlist'}) as ydl:
    info = ydl.extract_info(
        'https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf',
        download=False
    )
    
    print(f"Playlist: {info['title']}")
    print(f"Videos: {len(info['entries'])}")
    
    for entry in info['entries']:
        print(f"  - {entry['title']}")

_type

string

Type of result: ‘video’, ‘playlist’, ‘multi_video’, or ‘url’

entries

list[dict]

For playlists: list of video info dictionaries

playlist_title

string

Title of the playlist

playlist_count

int

Total number of videos in playlist

Extractor Arguments

Some extractors accept additional arguments to customize their behavior:

import yt_dlp

ydl_opts = {
    'extractor_args': {
        'youtube': {
            'skip': ['dash', 'hls'],  # Skip DASH and HLS formats
            'player_client': ['android'],  # Use Android client
        },
        'generic': {
            'timeout': ['30'],  # Custom timeout
        }
    }
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    ydl.download(['https://www.youtube.com/watch?v=dQw4w9WgXcQ'])

Extractor arguments must always be lists of strings, even for single values.

Common Extractor Examples

YouTube

import yt_dlp

ydl_opts = {
    'format': 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]',
    'extractor_args': {
        'youtube': {
            'skip': ['dash'],  # Skip DASH formats
        }
    }
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    info = ydl.extract_info('https://www.youtube.com/watch?v=dQw4w9WgXcQ', download=False)
    print(f"Channel: {info['channel']}")
    print(f"Subscribers: {info.get('channel_follower_count', 'N/A')}")

Generic Extractor

The generic extractor can handle many sites by detecting embedded videos:

import yt_dlp

with yt_dlp.YoutubeDL({}) as ydl:
    # Will use generic extractor for unsupported sites
    info = ydl.extract_info('https://example.com/some-video', download=False)

List Supported Sites

import yt_dlp

extractors = yt_dlp.list_extractor_classes()
print(f"Total extractors: {len(list(extractors))}")

# Filter to working extractors only
working = [ie for ie in yt_dlp.gen_extractor_classes() if ie.working()]
print(f"Working extractors: {len(working)}")

Extractor Plugins

You can create custom extractors as plugins. Place them in:

~/.yt-dlp/plugins/extractor/
${XDG_CONFIG_HOME}/yt-dlp/plugins/extractor/

Example custom extractor:

# ~/.yt-dlp/plugins/extractor/mysite.py
from yt_dlp.extractor.common import InfoExtractor

class MySiteIE(InfoExtractor):
    _VALID_URL = r'https?://(?:www\.)?mysite\.com/video/(?P<id>[0-9]+)'
    
    def _real_extract(self, url):
        video_id = self._match_id(url)
        
        # Fetch and parse the webpage
        webpage = self._download_webpage(url, video_id)
        
        return {
            'id': video_id,
            'title': self._html_search_regex(r'<title>(.+?)</title>', webpage, 'title'),
            'url': self._html_search_regex(r'file: "(.+?)"', webpage, 'video url'),
        }

Handling Age-Restricted Content

import yt_dlp

ydl_opts = {
    'age_limit': 18,  # Allow content up to this age rating
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    # Will filter out content above age limit
    info = ydl.extract_info(url, download=False)

Retry Logic

Extractors support automatic retry on failures:

import yt_dlp

ydl_opts = {
    'extractor_retries': 3,  # Retry up to 3 times
    'retry_sleep_functions': {
        'extractor': lambda n: n * 2,  # Sleep 2n seconds between retries
    }
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    ydl.download([url])

Documentation Index

​What is an Extractor?

​Using Extractors

​List Available Extractors

​Get a Specific Extractor

​Force a Specific Extractor

​InfoExtractor Base Class

​Key Attributes

​Common Methods

​suitable()

​working()

​Information Dictionary Format

​Required Fields

​Format Information

​Optional Metadata Fields

​Working with Playlists

​Extractor Arguments

​Common Extractor Examples

​YouTube

​Generic Extractor

​List Supported Sites

​Extractor Plugins

​Handling Age-Restricted Content

​Retry Logic

What is an Extractor?

Using Extractors

List Available Extractors

Get a Specific Extractor

Force a Specific Extractor

InfoExtractor Base Class

Key Attributes

Common Methods

suitable()

working()

Information Dictionary Format

Required Fields

Format Information

Optional Metadata Fields

Working with Playlists

Extractor Arguments

Common Extractor Examples

YouTube

Generic Extractor

List Supported Sites

Extractor Plugins

Handling Age-Restricted Content

Retry Logic