How Netflix directs 1/3rd of
Haley Tucker
Mohit Vora
QCon
San Francisco
Nov 16, 2015
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/netflix-streaming-arch
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
Playback
Overview
DATA PLANE
(CDN)
CONTROL PLANE
STREAM
NETFLIX
DEVICE
Project 366 #59; 280212 Days Gone By..., CC BY-SA, Pete 2012, Flickr
AUDIOVIDEO TEXT
STREAMS
How do we build a streaming “tape”?
Determine the preferred experience
DEVICETITLE
CONNECTIONS
COUNTRY
NETWORK
Broadband - wired or wifi
Cellular - Edge, 3G, LTE, ...
CUSTOMER
That’s exactly what I want
...now where can I get it?
Point the device to appropriate locations
Steering
GENERATE
PLAYBACK
MANIFEST
PLAYBACK
MANIFEST
PLAYBACK MANIFEST
Uh-oh, the
content is
encrypted!
Keymaster, CC BY-SA, Sean McGrath 2007, Flickr
LICENSE
LICENSE
And...Action!
SESSION
(START, STOP, PAUSE,
RESUME, KEEPALIVE)
SESSION EVENTS
LICENSE
PLAYBACK
MANIFEST
GENERATE
PLAYBACK
MANIFEST
SESSION
(START, STOP, PAUSE,
RESUME, KEEPALIVE)
PLAYBACK LIFECYCLE
Data Plane
(CDN)
What is a Content Delivery Network?
Open
Connect
A NETFLIX ORIGINAL
CONTENT RANK
BYTES
STREAMED
PREDICTABLE VIEWING PATTERNS
FILLING WHEN YOU SLEEP
Dreaming…,CCBY-SA,EleniBoulsaiki2009,Flickr
FILLING WHEN YOU SLEEP
Open
Connect
A NETFLIX ORIGINAL
READ XOR WRITE
ONEWAY,CCBY-SA,KennyLouie2010,Flickr
Content Delivery Mechanisms
DATA PLANE
(CDN)
CONTROL PLANE
STREAM
NETFLIX
DEVICE
STREAM
ISP DATA
CENTER
ISP
ROUTER
NETFLIX
DEVICE
STREAM
ISP DATA
CENTER
ISP
ROUTER
NETFLIX
DEVICE
ISP CO-LOCATION
STREAM
ISP DATA
CENTER
ISP
ROUTER
NETFLIX
DEVICE
STREAM
ISP DATA
CENTER
NETFLIX
DEVICE
IXP DATA
CENTER
NFLX
ROUTER
ISP
ROUTER
ISP
ROUTER
NETFLIX
STREAM
ISP DATA
CENTER
NETFLIX
DEVICE
IXP DATA
CENTER
NFLX
ROUTER
ISP
ROUTER
ISP
ROUTER
NETFLIX
STREAM
ISP DATA
CENTER
NETFLIX
DEVICE
IXP DATA
CENTER
NFLX
ROUTER
ISP
ROUTER
ISP
ROUTER
IXP INTERCONNECTION
NETFLIX
Control
Plane
OPEN CONNECTSTREAM
NETFLIX
DEVICE
CDN
CONTROL
PLANE
DEVICE
CONTROL
PLANE
DON’T KEEP SECRETS
Network Proximity
Content Positioning
Load Distribution
Network Proximity
Social Network in a Course, CC BY-SA, Hans Põldoja 2010, Flickr
By Specification?
By Specification?
Doesn’t scale
Border Gateway Protocol
TAKEAWAY
BGP ROUTE
175.231.128.0/24
(+ proximity attributes)
Use BGP
ISP2 DATA
CENTER
ISP2 BGP
ROUTES
CONTROL
PLANE
IXP DATA
CENTER
ISP1 BGP
ROUTES
ISP1 DATA
CENTER ISP1
NFLX
BGP ROUTE
175.231.128.0/24
(+ proximity attributes)
Content Positioning
LOCALIZE TRAFFIC
ISP
DATA CENTER
SERVE CACHE
MISS
HOW DO WE DETERMINE WHAT CONTENT
WILL BE POPULAR TOMORROW?
CHANGING CATALOG
EVOLVING MEMBER TASTES
MINIMIZE FILL CHURN
ISP
DATA CENTER
OFF PEAK
FILL
USE HISTORICAL DATA
CONTENT RANKBYTES
STREAMED
bytesStreamed/bytesStored
IS ONE DAY OF HISTORY ENOUGH?
EXPONENTIALLY WEIGHTED
MOVING AVERAGE
WEIGHT
DAYS AGO
0 10 20 30 40
…
= 0.9
TAKEAWAY Weigh Recent Data Higher
HOW SHOULD CONTENT BE ALLOCATED?
MILLIONS
OF FILES
THOUSANDS
OF SERVERS
HOW SHOULD CONTENT BE ALLOCATED?
SVR4
SVR2
SVR1
SVR3
FILE1
FILE3
FILE1
TAKEAWAY
ALLOCATE MULTIPLE REPLICAS
RESILIENT TO CLUSTER CHANGES
REPEATABLE
Consistent Hashing
ISP2 DATA
CENTER
WHAT TO
FILL?
CONTROL
PLANE
IXP DATA
CENTER
WHERE TO
FILL FROM?
ISP1 DATA
CENTER
S3
FILL OVER
HTTP
Load Distribution
CONTENT RANKBYTES
STREAMED
LOTS OF
THROUGHPUT
LOTS OF
STORAGE
CONTENT WITH CONFLICTING CONSTRAINTS
SSD BASED
SPINNING DISK
BASED
WITHIN CLUSTERS ON EACH SERVER
MEMORY
CONTENT RANK
BYTES
STREAMED
SSD SPINNING DISK
TAKEAWAY Tier Infrastructure
ACROSS SERVERS
WITHIN CLUSTERS
BALANCE
BALANCE
ACROSS EQUIDISTANT
CLUSTERS
HOW DO WE BALANCE LOAD?
OPEN CONNECTNETFLIX
DEVICE
CDN
CONTROL
PLANE
DEVICE
CONTROL
PLANE
LOAD
BALANCER
STREAM
USING CONTENT DISTRIBUTION
HOW DO WE BALANCE LOAD?
FLIP A COIN
AND WHEN WE HAVE EQUALLY ATTRACTIVE
LOCATIONS TO SERVE FROM –
INCIDENT LOAD
SYSTEM
METRICS
MAX
INSANESANE
HOW DO WE LOAD SERVERS OPTIMALLY?
… AMIDST EVER CHANGING INTERNET WEATHER
TRAFFIC
t
… AND DAILY TRAFFIC EBBS AND FLOWS
+ SERVE
STREAMS
FEEDBACK
-
TRAFFIC EFFECT ON
SYSTEM METRICS
CONTROL
WE INTRODUCE A FEEDBACK LOOP
TAKEAWAY PID CONTROLLER
TAKEAWAY PID CONTROLLER
Process
Variable
Set Point
Control
Variable
Current RPM
Desired RPM
Input Voltage
System Metrics
System Metrics
Max
Controlled
Traffic
DC MOTOR
TAKEAWAY PID CONTROLLER
Process
Variable
Set Point
Control
Variable
System Metrics
System Metrics
Max
Controlled
Traffic
Current RPM
Desired RPM
Input Voltage
LOADING SERVERS
ISP2 DATA
CENTER
CONTROL
TO 80%
CONTROL
PLANE
IXP DATA
CENTER
NO
CONTROL
ISP1 DATA
CENTER
0.0 < CONTROL VAR < 1.0
TRAFFIC
t
NEXT HOP
TRAFFIC SHIFTS TO NEXT HOP LOCATION
Steering
STREAM
NETFLIX
DEVICE
CDN
CONTROL
PLANE
PLAYBACK
SERVICES
STEERING
Got URLs for
f1, f2, …, fn?
Yes, here’s
the URLs
PROXIMITY
HEALTH
CONTENT
CASS
KAFKA
OPEN CONNECT
Architecture
Evolution
5 CHALLENGES
API
STEERING
SESSION
MANIFEST
DRM
LICENSE
How did we evolve from here...
API
STEERING
SESSION
MANIFEST
DRM
LICENSE
CLIENT SCRIPTS
SERVICE LAYER
RULES
INSIGHTS
...to here.
5 SOLUTIONS
CACHE
DEVICE
CUSTOMER
TITLE
NETWORK
Broadband - wired or wifi
Cellular - Edge, 3G, LTE, ...
CONNECTIONS
COUNTRY
High dimensionalityCHALLENGE
How can we quickly alter the playback
experience in a targeted manner?
ALL
STREAMS
FOR
CONTENT
ENGINE
RULES
BEST
STREAMS
FOR
SESSION
Stream FilteringUSE CASE
EXAMPLE RULES
ENGINE
CONFIGURATION
MANAGEMENT UI
UPDATING RULES
TOPIC
PUBLISH
RULES
SUBSCRIBE
Dynamic Business Rules
API
STEERING
SESSION
MANIFEST
DRMLICENSE
RULES
TAKEAWAY
Pinpoint what is brokenCHALLENGE
Haystacks,CCBY-SA,JohnPavelka2008,Flickr
3:00 AM : Pager goes off
METRICS AND ALERTING
OK...error code 105 is elevated. But
why?
Indexed Logging
Detailed Domain Insights
API
STEERING
SESSION
MANIFEST
DRMLICENSE
RULES
INSIGHTS
TAKEAWAY
Large amount of stateCHALLENGE
How can we enable faster UIs and
low-end devices?
We introduced a server-side caching tier
MANIFESTSCUSTOMERA
CUSTOMERA
CUSTOMERB
Watch out for resiliency issues!!
Ping Pong project, CC BY-SA, Michael Knowles 2008, Flickr
API
STEERING
SESSION
MANIFEST
DRMLICENSE
RULES
INSIGHTS
Reduce client stateTAKEAWAY
CACHE
Managing device protocolsCHALLENGE
Square peg, round hole, CC BY-SA, Simon Law 2006, Flickr
Can we allow devices to define their
own protocols?
DYNAMIC SCRIPTING PLATFORM
SESSION
LICENSE
MANIFEST
XBOX
iPHONE
HTML5
PLAYER
iphone.groovy
JAVASERVICE
LAYER
xbox.groovy
html5.groovy
API
STEERING
SESSION
MANIFEST
DRMLICENSE
RULES
INSIGHTS
Client-driven protocols
API
CLIENT
SCRIPTS
SERVICE
LAYER
TAKEAWAY
CACHE
Enabling high-velocity innovationCHALLENGE
CC BY-SA, Nathan E Photography 2008, Flickr
How can we expose new data with the
least amount of churn?
API MANIFEST
Stream
● Bitrate
● Framerate
● Dynamic Data
Stream’
● Bitrate
● Dynamic Data
This works from API:
● stream.getBitrate()
● stream.getDynamicData().get(“FRAME_RATE”)
Works
both
ways!
This works from CLIENT SCRIPT!
● stream.getDynamicData().get(“BIT_RATE”)
● stream.getDynamicData().get(“FRAME_RATE”)
CLIENT SCRIPT
Stream’’
● Dynamic Data
Works
both
ways!
API MANIFEST
Stream
● Bitrate
● Framerate
● Dynamic Data
Stream’
● Bitrate
● Dynamic Data
Works
both
ways!
API
CLIENT
SCRIPTS
SERVICE
LAYER
STEERING
SESSION
MANIFEST
DRM
LICENSE
RULES
INSIGHTS
Data pass-thruTAKEAWAY
CACHE
TAKEAWAYS
● BGP based proximity
● Tiered Infrastructure
● PID Controller
● EWMA for historical data
● Consistent Hashing
● Dynamic business rules
● Detailed domain insights
● Reduce client state
● Client-driven protocols
● Data pass-thru
TAKEAWAYS
● BGP based proximity
● Tiered Infrastructure
● PID Controller
● EWMA for historical data
● Consistent Hashing
● Dynamic business rules
● Detailed domain insights
● Reduce client state
● Client-driven protocols
● Data pass-thru
Questions?
Haley Tucker
@hwilson1204
Mohit Vora
@mohitvora
STREAM
NETFLIX
DEVICE
NETFLIX
DEVICE
STREAM
SPINNING
DISK SERVERS
SSD SERVERS
WHAT TO
FILL?
WHERE TO
FILL FROM?
API
CLIENT
SCRIPTS
SERVICE
LAYER
CACHE
CONTROL
DON’T KEEP
SECRETS
STEERING
SESSION
MANIFEST
DRMLICENSE
RULES
CACHE
INSIGHTS
IXP DATA
CENTER
ISP1
ISP2
ISP2 BGP
ROUTES
ISP1 BGP
ROUTES
CONTROL
TO 80%
● Background image from https://www.flickr.com/photos/centralasian/4099515384, Image was
cropped and red lines and dots were drawn on top, https://creativecommons.org/licenses/by/2.0/.
● Image from https://www.flickr.com/photos/28705377@N04/4142872268, No modifications made,
https://creativecommons.org/licenses/by/2.0/.
● Image of cassette is from https://www.flickr.com/photos/comedynose/6939206771, Image was
cropped, https://creativecommons.org/licenses/by/2.0/.
● Image of speaker is from https://www.flickr.com/photos/av_hire_london/5578975575, No
changes made, https://creativecommons.org/licenses/by/2.0/.
● Image of television is from https://www.flickr.com/photos/jvcamerica/3660897684/, No changes
made, https://creativecommons.org/licenses/by/2.0/.
● Image of text is from https://www.flickr.com/photos/dno1967b/5754743006, No changes made,
https://creativecommons.org/licenses/by/2.0/.
● Background image from https://www.flickr.com/photos/mcgraths/866572532, Image was cropped,
https://creativecommons.org/licenses/by/2.0/.
● Image from https://www.flickr.com/photos/thatguyfromcchs08/2300190277, Image is dimmed,
https://creativecommons.org/licenses/by/2.0/.
● Image from https://www.flickr.com/photos/mknowles/3134373590, Image was cropped, https:
//creativecommons.org/licenses/by-sa/2.0/.
Image Attributions
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/netflix-
streaming-arch

How Netflix Directs 1/3rd of Internet Traffic