Every time a big company screws up, there are two highly informed sets of people who are guaranteed to be lurking, but rarely post, in a thread like this:
1) those directly involved with the incident, or employees of the same company. They have too much to lose by circumventing the PR machine.
2) people at similar companies who operate similar systems with similar scale and risks. Those people know how hard this is and aren’t likely to publicly flog someone doing their same job based on uninformed speculation. They know their own systems are Byzantine and don’t look like what random onlookers think it would look like.
So that leaves the rest, who offer insights based on how stuff works at a small scale, or better yet, pronouncements rooted in “first principles.”
I've noticed this amongst the newer "careerist" sort of software developer who is stumbling into the field for money, as opposed to the obsessive computer geek of yesteryear, who practiced it as a hobby. This character archetype is a transplant, say, less than five years ago from another, often non-technical discipline, and was taught or learned from overly simplistic materials that decry systems programming, or networking, or computer science concepts as unnecessary, impractical skills, reducing everything to writing JavaScript glue code between random NPM packages found on google.
Especially in a time where the gates have come crashing down to pronouncements of, "now anybody can learn to code by just using LLMs," there is a shocking tendency to overly simplify and then pontificate upon what are actually bewilderingly complicated systems wrapped up in interfaces, packages, and layers of abstraction that hide away that underlying complexity.
It reminds me of those quantum woo people, or movies like What the Bleep Do We Know!? where a bunch of quacks with no actual background in quantum physics or science reason forth from drastically oversimplified, mathematics-free models of those theories and into utterly absurd conclusions.
Because it is about people speculating on events that seem connected to their own experience, but in actuality aren’t, because they don’t understand the breadth of the distribution of the abstraction they are discussing.
This happens when your terms are underspecified: someone says “Netflix’s servers are struggling under load” and while people in similar efforts know that basically just equivalent to “something is wrong” and the whole conversation is basically esoteric to most people outside a few specialized teams, these other people jump to conclusions and start having conversations based on their own experience having to do with what is (to them) related (and usually fashionable, because that is how most smaller players figure out how to do things).
Even before LLMs were trendy, at the time of covid 19, a lot of people surprisingly became "experts" on the matter of virology and genetics on social networks.
Completely agreed. There are also former employees who have very educated opinions about what is likely going on, but between NDAs and whatnot there is only so much they are willing to say. It is frustrating for those in the know, but there are lines they can't or won't cross.
Whenever an HN thread covers subjects where I have direct professional experience I have to bite my tongue while people who have no clue can be as assertive and confidently incorrect as their ego allows them to be.
Some people can just let others be wrong and just stay silent, but some people can't help themselves. So if you say something really wrong, like this was caused by Netflix moving to Azure, they should have stayed on AWS! someone will come along to correct you. If you're looking for the right answer, post the wrong one, alongside some provoking statement (Windows is better than Linux because this works there), and you'll get the right answer faster than if you'd asked your question directly.
> Some people can just let others be wrong and just stay silent, but some people can't help themselves.
As one of thoes who cant help themselves; the way you phrase it feels a bit too cynical, I've always interpreted it as people want to help, but don't want to offer something that's wrong. Which is basically how falsifiable science works. It's so much easier to refute the assertion that birds generate lift with tiny backpacks with turboprops attached. Than it is to explain the finer details of avian flight mechanics. I couldn't describe above a superficial level how flapping works, but I can confidently refute the idea of a turboprop backpack. (Everyone knows birds gave up the turboprop design during the great kerosene shortage of 1128)
It depends on the medium and the cost of looking like an idiot. On the Internet where some tosser is going to call you names anyway? Saying dumb shit to nerdsnipe someone else to do hours of research and write an essay on it for you, at the expense of them calling you an idiot, is cheap, and easier than doing all that work yourself. Meanwhile, at work, I'm the one getting nerd sniped into doing a bunch of extra work.
Right? A common complaint by outsiders is that Netflix uses microservices. I'd love to hear exactly how a monolith application is guaranteed to perform better, with details. What is the magic difference that would have ensured the live stream would have been successful?
I am one of the ones who complain about their microservices architecture quite a lot.
This comes from both first-hand experience of talking to several of their directors when consulted upon on how to make certain systems of theirs better.
It's not just a matter of guarantees, it's a matter of complexity.
Like right now Google search is dying and there's nothing that they can do to fix it because they have given up control.
The same thing happened with Netflix where they wanted to push too hard to be a tech company and have their tech blogs filled with interesting things.
On the back end they went too deep on the microservices complexity. And on the front end for a long time they suffered with their whole RxJS problem.
So it's not an objective matter of what's better. It's more cultural problem at Netflix. Plus the fact that they want to be associated with "Faang" and yet their product is not really technology based.
Google search is dying because of business reasons, not technical ones. The ads branch is actively cannibalizing search quality to make people perform more searches and view more ads.
You can explain these problems with simple business metrics that technologists like to ignore. Right before the recent Twitter acquisition, the various bits of info that came to the limelight included the "minor detail" that they had more than doubled their headcount and associated expenses, but had not doubled either their revenue or profits. Technology complexity went up, the business went backwards. Thousands of programmers doesn't always translate to more value!
Netflix regularly puts out blog articles proudly proclaiming that they process exabytes of logs per microsecond or whatever it is that their microservices Rube Goldberg machine spits out these days, patting themselves on the back for a heroic job well done.
Meanwhile, I've been able to go on the same rant year after year that they're still unable to publish more than five subtitle languages per region. These are 40 KB files! They had an employee argue with me about this in another forum, saying that the distribution of these files is "harder than I thought".
It's not hard!
They're solving the wrong problems. The problems they're solving are fun for engineers, but pointless for the business or their customers.
From a customer perspective Netflix is either treading water or noticeably getting worse. Their catalog is smaller than it was. They've lost licensing deals for movies and series that I want to watch. The series they're producing themselves are not things I want to watch any more. They removed content ratings, so I can't even pick something that is good without using my phone to look up each title manually!
Microservices solve none of these issues (or make it worse), yet this is all we hear about when Netflix comes up in technology discussions. I've only ever read one article that is actually relevant to their core business of streaming video, which was a blog about using kTLS in BSD to stream directly from the SSD to the NIC and bypassing the CPU. Even that is questionable! They do this to enable HTTPS... which they don't need! They could have just used a cryptographic signature on their static content, which the clients can verify with the same level of assurance as HTTPS. Many other large content distribution networks do this.
It's 100% certain that someone could pretend to be Elon, fire 200-500 staff from the various Netflix microservices teams and then hire just one junior tech to figure out how to distribute subtitles... and that would materially improve customer retention while cutting costs with no downside whatsoever.
> Right before the recent Twitter acquisition, the various bits of info that came to the limelight included the "minor detail" that they had more than doubled their headcount and associated expenses, but had not doubled either their revenue or profits.
Every tech company massively inflated their headcount during the leadup to the Twitter acquisition because money was free.
I interviewed at Meta in 2021 and asked an EM what he would do if given a magic wand to fix one problem at the company. His response: "I would instantly hire 10,000 more engineers."
Elon famously did the opposite and now revenue is down 80%.
From my experience, this answer usually belies someone who doesn’t fully understand the system and problems of the business. The easy answer when overwhelmed is “we need more people.” To use a manufacturing analogy, you can cover up a lot of quality issues with increased throughout, but it makes for an awfully inefficient system.
Borrowing costs went to nearly zero. That's not the same thing. You have to repay the money, you just don't have to repay it with interest.
I would have assumed people generally know this, but everybody (and I do mean everybody) talks like they don't know this. I would like to assume that "money is free" is just a shorthand, buuuut... again... these arguments! People like that EM talk like it was literally free money raining from the sky that could be spent (gone!) without it ever having to be repaid.
If you watched any of the long-form interviews Musk gave immediately after the acquisition, he made the point that if he hadn't bailed out Twitter, it had maybe 3 months of runway left before imploding.
Doubling headcount without a clear vision of how that would double revenues is madness. It is doubly so in orgs like Twitter or Netflix where their IT was already over-complicated.
It's too difficult for me to clearly and succinctly explain all the myriad ways in which a sudden inrush of noobs -- outnumbering the old guard -- can royally screw up something that is already at the edge of human capability due to complexity. There is just no way in which it would help matters. I could list the fundamental problems with that notion for hours.
Companies weren’t issuing debt to pay for headcount. The reason market interest rates matter is that when interest rates are low, your company stock doesn’t have to have high returns to get investment. When these conditions exist, companies feel safer hiring people to invest in growth instead of saving to provide high shareholder returns.
I highly recommend everyone take a university-level financial instruments course. The math isn’t super hard, and it does a very good job of explaining how rational investors behave.
So you’re saying the investors are happy to see their money set on fire?
Surely they expect at a minimum that their capital investment would make them dividends (increased revenue), and also that the money wasn’t simply set on fire with nothing to show for it and no way to repay it.
If I’m wrong then Twitter - and similar companies - are little better than Ponzi schemes, with investors relying on the money of the greater fool to recover their money.
> So you’re saying the investors are happy to see their money set on fire?
Ah, HN, where you try to explain how things work, and you get ignorant sarcasm in return.
> Surely they expect at a minimum that their capital investment would make them dividends (increased revenue), and also that the money wasn’t simply set on fire with nothing to show for it and no way to repay it.
Yes, of course. But when safe investments (e.g., Treasuries) are paying out close to zero, investors are going to tolerate lower returns than they do when Treasuries are paying out 3% or more.
It's basic arithmetic: you take the guaranteed rate, add a risk premium, and that's what investors expect from riskier investments. This is well-covered in the class I recommended.
Also, not every investor thinks in terms of consistent return. A pensioner may have a need for a guaranteed 3% annual return to keep pace with inflation. A VC, on the other hand, is often content to have zero returns for years followed by a 100x payout through an IPO.
> A VC, on the other hand, is often content to have zero returns for years followed by a 100x payout through an IPO
I know how all this works, but 100x payout is for the small initial investments, not after 10 years of operating at multi-billion-dollar scales.
Small amounts of money are set on fire all of the time, chasing this kind of high-risk return.
Nonetheless, there's an expectation of a return, even if only in aggregate across many small startups.
What I was observing (from the outside, at a distance) was that Twitter was still being run by a startup despite being in an effectively monopoly position already and a "mature" company. Similarly, Amazon could set money on fire while they were the growing underdog. If they doubled their headcount today without doubling either revenue or profits, the idiots responsible for that would be summarily fired.
I get that Silicon Valley and their startup culture does a few things in an unusual way, but that doesn't make US dollars not be US dollars and magically turn into monopoly money that rains from the sky just because interest rates are low.
Yeah, try dealing with many frontends with mixed HTTP and HTTPS, it's a nightmare and won't always work. Additionally, you want security on content delivery for revenue protection reasons. The way you've massively over simplified the BSD work shows that you perhaps didn't understand what they did and why hardware offload is a good thing?
Subtitles are also complicated because you have to deal with different media player frameworks on the +40 different players you deal with. Getting those players, which you may not own, to recognise multiple sub tracks can be a PITA.
Things look simple to a junior developer, but those experience in building streaming platforms at scale know there are dragons when you get into the implementation. Sometimes developers and architects do over complicate things, but smart leaders avoid writing code, so its an assumption to say things are being made over complicated.
I read and understood their entire technical whitepaper. I get the what, I'm just saying that the why might not make as much sense as you might assume.
> +40 different players you deal with
They own the clients. They wrote the apps themselves. This is Netflix code reading data from Netflix servers. Even if there are third-party clients (wat!?), that doesn't explain why none of Netflix's home-grown clients support more than 5 subtitle languages.
> Getting those players, which you may not own, to recognise multiple sub tracks can be a PITA.
This is a core part of the service, which everyone else has figured out. Apple TV for example has dozens of subtitle languages.[1]
With all due respect: Read what you just wrote. You're saying that an organisation that has the engineering prowess to stream at 200 Gbps per edge box and also handles terabytes of diagnostic log ingestion per hour can't somehow engineer the distribution of 40 KB text files!?
I can't even begin to outline the myriad ways in which these excuses are patent nonsense.
These are children playing with the fun toys, totally ignoring like... 1/3rd of the viewing experience. As far as the users are concerned, there's nothing else of consequence other than the video, audio, and text that they see on the screen.
"Nah, don't worry about the last one, that only affects non-English speakers or the deaf, we only care about DEI for internal hires, not customers."
[1] Just to clarify: I'm asking for there to be an option to select one language at a time from all available languages, not showing multiple languages at once, which is a tiny bit harder. But... apparently not that hard, because I have two different free, open-source video players on my PC that can do this so I can have my spouse get "full" subtitles in a foreign language while I see the "auto" English subtitles pop up in a different colour when appropriate. With Netflix I have to keep toggling between her language and my language every time some foreign non-English thing is said. Netflix is worth $362B, apparently, but hasn't figured out something put together by poor Eastern European hobbyists in their spare time.
See, you're confused because you think that the media player is owned by Netflix.
The browser gives you a certain level of control on computers, although you have to deal with the oddities of Safari, but when you go to smart TVs it's the wild west. Netflix does provide their tested framework to TV vendors but it's still not easy, because media playback often requires hardware acceleration, but the rendering framework isn't standard.
Developing for set-top boxes, multiple generations of phones, and smart TVs comes with all sorts of oddities. You think it's easy because you haven't done it.
> they want to be associated with "Faang" and yet their product is not really technology based.
You lost me. Netflix built a massive CDN, a recommendation engine, did dynamic transcoding of video, and a bunch of other things, at scale, quite some years before everyone else. They may have enshittified in the last five years, but I dont see any reason why they dont have a genuinely legitimate claim to being a founder member of the FAANG club.
I have a much harder time believing that companies with AI in their name or domain are doing any kind of AI, by contrast.
Speaking as a consumer, Netflix’s solution is objectively better than it’s competitors. It handles network blips better, it’s more responsive to use, it has far fewer random bugs you need to work around.
You can argue whether or not that edge translates into more revenue, but the edge is objectively there.
Agree. Hulu, HBO/Max, and Disney Plus each do most of these:
- frequently decide that episodes I've watched are either completely unwatched (with random fully watched eps of the show mixed in).
- seemingly every time I leave at the start of the end-credits, I surely must have intended to come back and watch them.
- rebuild the entire interface (progressively, slowly) when I've left the tab unfocussed for too long. Instead of letting continue where I was, they show it for less than a second, the rebuild the world.
- keep resetting the closed-caption setting to "none", regardless of choosing "always" or "on instant replay"; worse, they sometimes still have the correct setting in the interface, but have disabled captions anyway.
Netflix has only once since they started streaming forgotten playback position or episode completion. They politely suggest when to reload the page (via a tiny footer banner), but even that might not appear for months. They usually know where end-credits really start, and count that as completion. They don't seem to mess with captions.
Hard to speak "objectively" as a consumer who has their own regional biases and knows none of the sausage underneath.
Maybe you're in a rural area and Netflix scaled gracefully. Maybe you're deep in SF and Netflix simply outspent to give minimal disruption to a population hub. These could both be true but don't speak to what performs better overall.
I have always wondered how do they deliver their content and what goes on behind the scenes and nobody on tech twitter or even youtubers talk about pornhub's infra for some reason. A lot of the innovation in tech has roots in people wanting to see high quality tiddies on the internet.
> It's not guaranteed, but much fewer points of failure.
Can you explain where this is relevant to buffering issues?
Also, you are very wrong regarding failure modes. The larger the service, the more failure modes it has. Moreover, in monoliths if a failure mode can take down/degrade the whole service, all other features are taken down/degraded. Is having a single failure mode that brings down the whole service what you call fewer points of failure?
I can't, since I don't know Netflix's architecture - I was responding to "I'd love to hear exactly how a monolith application is guaranteed to perform better, with details."
I doubt a "microservice" has anything to do with delivering the video frames. There are specific kinds of infrastructure tech that are specifically designed to serve live video to large amounts of clients. If they are in fact using a "microservice" to deliver video frames, then I'd ask them to have their heads examined. Microservices are typically used to do mundane short-lived tasks, not deliver video.
I’m genuinely curious about the reasoning behind that statement. It’s very possible that you are using a different set of assumptions or definitions than I am.
I say that because, for performance reasons, you’d never want to wait on potentially several hops to stream media and because the act of streaming could very well be a good enough domain boundary.
That’s okay, you probably just haven’t worked with high performance services or micro services before.
Network requests (sometimes called hops) take a significant amount of time. You don’t want your streaming service to take a significant amounts of time.
In microservices land, you generally try making services based on some “domain” (metaphorical, not like a literal domain name) which defines the responsibility of any given service. Defining these domains is more art than science and depends on the business needs and each team.
Video streaming might be one of those domains for Netflix.
The only time I worked on a project that had a live television launch, it absolutely tipped over within like 2 minutes, and people on HN and Reddit were making fun of it. And I know how hard everyone worked, and how competent they were, so I sympathize with the people in these cases. While the internet was teeing off with easy jokes, engineers were swarming on a problem that was just not resolving, PMs were pacing up and down the hallway, people were getting yelled at by leadership, etc. It's like taking all the stress and complexity of a product launch and multiplying it by 100. And the thing I'm talking about was just a website, not even a live video stream.
Some breaks are just too difficult to predict. For example, I work in ecommerce and we had a page break because the content team pushed too many items into an array, that caused a back-end service to throw errors. Because we were the middle-service, taking from the CMS and making the request to back-end, not sure how we could have seen that issue coming in advance (and no one knew there was a limit).
Absolutely. I think a great filter for developers is determining how well they understand this. Over-simplification of problems and certainty about one’s ability to build reliable services at scale is a massive red flag to me.
I have to say some of the hardest challenges I’ve encountered were in e-commerce, too.
It’s a lot harder and more interesting than I think many people realize. I learned so much working on those projects.
In one case, the system relied on SQLite and god damn did things go sideways as the company grew its customer base. That was the fastest database migration project I’ve ever been on, haha.
I often think it could have worked today. SQLite has made huge leaps in the areas we were struggling. I’m not sure it would have been a forever solution (the company is massive now), but it would have bought us some much-needed time. It’s funny how that stuff changes. A lot of my takeaways about SQLite 10 years ago don’t apply quite the same anymore. I use it for things now that I never would have back then.
All requests expect errors. How a developer handles them... well...
And for limit checking, how often do you write array limit handlers? And if the BE contract doesn't specify? Additionally, it will need as a regression unit test, because who knows when the next developer will remove that limit check.
Those performative people are worse than useless. They take up critical bandwidth and add no real value.
An effective operational culture has methods for removing those people from the conversations that matter. Unfortunately that earns you a reputation for being “cutthroat” or “lacking empathy.”
Both of those are real things, but it’s the C players who claim they are being unfairly treated, when in fact their limelight-seeking behavior is the problem.
If all that sounds harsh, like the kitchen on The Bear, well…that’s kinda how it is sometimes. Not everyone thrives in that environment, and arguably the ones who do are a little “off.”
While that may be the case, the things like this I've experienced have been more along the lines of incompetent management.
In one case I was doing an upgrade on an IPTV distribution network for a cable provider (15+ years ago at this point). This particular segment of subscribers totalled more than 100k accounts. I did validation of the hardware and software rev installed on the routers in question prior to my trip to the data center (2+ hour drive). I informed management that the currently running version on the router wasn't compatible with this hardware rev of card I was upgrading to. I was told that it would in fact work, that we had that same combination of hw/sw running elsewhere. I couldn't find it when I went to go look at other sites. I mentioned it in email prior to leaving I was told to go.
Long story short, the card didn't work, had to back it out. The HA failover didn't work on the downgrade and took down all of those subscribers as the total outage caused a cascading issue with some other gear in this facility. All in all it was during off-peak time of day, but it was a waste of time and customer sat.
I’m just pointing out that there are Netflix engineers reading all these words.
For every thread like this, there are likely people who are readers but cannot be writers, even though they know a lot. That means the active posters exclude that group, by definition.
These threads often have interesting and insightful comments, so that’s cool.
At the scale that Netflix just dealt with? Yeah I honestly think this is a case where less than 5000 people in the world are really qualified to comment.
Not clear what scale they were attempting, but yes delivering a live stream to 10m+ users on the public internet with a reasonable end to end latency (under 30 seconds glass to viewer) is not a trivial problem, and it’s not something Netflix do a lot.
It’s a very different problem to distributing video on demand which is Netflix’s core business.
3) the people supplying 1) and 2) with tools (hard- or software)
We (yep) don't know the exact details, but we do get sent snapshots of full configs and deployments to debug things... we might not see exact load patterns, but it's enough to know. And if course we can't tell due to NDAs.
If NFL decides to keep Netflix for that, that is. The bandwidth for that fight was rookie numbers, and after that fiasco, why would the NFL not break their contract and choose someone with a proven track record doing bigger live events, like the World Cup?
Because Netflix pays them either way, I would imagine. Breaking a contract on a sure thing to the tune of tens (hundreds?) of millions of dollars for a maybe is a large business risk.
Reputational damage is going to be far more Netflix than the NFL if they totally club it.
That and this fight is going to likely be an order of magnitude more viewers than the Christmas NFL games if the media estimates on viewership were remotely accurate. You’re talking Super Bowl type numbers vs a regular season NFL game. The problems start happening at the margin of capacity most of the time.
But "reputational damage" doesn't affect profits. Nobody is canceling Netflix because they had issues watching the fight, just like nobody will cancel if the NFL experience sucks on Netflix. They will bitch and moan on Twitter, but it's essentially just talk.
I'm sure 2) can post. But it won't be popular, so you'll need to dig to find it.
Most people are consumers and at the end of the day, their ability to consume a (boring) match was disrupted. If this was PPV (I don't think it is) the paid extra to not get the quality of product they expected. I'm not surprised they dominate the conversation.
You may have belonged to one of those groups in the past, or maybe you will someday. I certainly have. Many of the more seasoned folks on HN have.
Stuff goes wrong, random internet people jump on the opportunity to speculate and say wildly off-the-mark comments, and the engineers trying to keep the ship from sinking have to sit quietly for fear of making the PR backlash worse.
I was interviewing a dev candidate some years ago and they were totally lost trying to traverse a tree on the whiteboard. I kept helping them get unblocked, because my philosophy is that anyone can get stuck once, but if I’m supposed decide whether to hire you, I should get the most/best data I can.
Another person was observing the interview, for training purposes, and afterwards said to me: “Do you have kids? You have so much patience!”
For an event like this, there already exists an architecture that can handle boundless scale: torrents.
If you code it to utilize high-bandwidth users upload, the service becomes more available as more users are watching -- not less available.
It becomes less expensive with scale, more available, more stable.
The be more specific, if you encode the video in blocks with each new block hash being broadcast across the network, just managing the overhead of the block order, it should be pretty easy to stream video with boundless scale using a DHT.
Could even give high-bandwidth users a credit based upon how much bandwidth they share.
With a network like what Netflix already has, the seed-boxes would guarantee stability. There would be very little delay for realtime streams, I'd imagine 5 seconds top. This sort of architecture would handle planet-scale streams for breakfast on top of the already existing mechanism.
But then again, I don't get paid $500k+ at a large corp to serve planet scale content, so what do I know.
The protocol for a torrent is that random parts of a file get seeded to random people requesting a file, and that the clients which act as seeds are able to store arbitrary amounts of data to then forward to other clients in the swarm. Do the properties about scaling still hold when it's a bunch of people all requesting real time data which has to be in-order? Do the distributed Rokus, Apple TVs, Fire TVs and other smart TVs all have the headroom in compute and storage to be able to simultaneously decode video and keep old video data in RAM and manage network connections with upload to other TVs in their swarm - and will uploading data to other TVs in the swarm not negatively impact their own download speeds?
Yes, the properties about scaling do hold even with near-real-time streams. [1]
The problems with using it as part of a distributed service have more to do with asymmetric connections: using all of the limited upload bandwidth causes downloads to slow. Along with firewalls.
But the biggest issue: privacy. If I'm part of the swarm, maybe that means I'm watching it?
The torrent is an example of the system I am describing, not the same system. Torrents cannot work for live streams because the entire content is not hashable yet, so already you have to rethink how it's done. I am talking about adding a p2p layer on top of the existing streaming protocol.
The current streaming model would prioritize broadcasting to high-bandwidth users first. There should be millions of those in a world-scale stream.
Even a fraction of these millions would be enough to reduce Netflix's streaming costs by an order of magnitude. But maybe Netflix isn't interested in saving billions?
With more viewers, the availability of content increases, which reduces load on the centralized servers. This is the property of the system I am talking about, so think backwards from that.
With a livestream, you want the youngest block to take priority. You would use the DHT to manage clients and to manage stale blocks for users catching up.
The youngest block would be broadcast on the p2p network and anyone who is "live" would be prioritizing access to that block.
Torrent clients as they are now handle this case, in reverse; they can prioritize blocks closer the current timestamp to created an uninterrupted stream.
The system I am talking about would likely function at any scale, which is an improvement from Netflix's system, which we know will fail -- because it did.
1. Everyone only cares about the most recent "block". By the time a "user" has fully downloaded a block from Netflix's seedbox, the block is stale, so why would any other user choose to download from a peer rather from netflix directly?
2. If all the users would prefer to download from netflix directly rather than a p2p user, then you already have a somewhat centralized solution, and you gain nothing from torrents.
If Netflix is at capacity and you have to wait for a peer, then you have simply reinvented the buffering problem. In other words
1. I exclusively download from a peer and my stream is measurably behind
2. I switch to a peer when Netflix is at capacity and then I have to wait for the peer to download from Netflix, and then for me to download from the peer. This will cause the same buffering issue that Netflix is currently being lambasted for.
This solution doesn’t solve the problem Netflix has
If Netflix were working correctly and could handle the load, you'd absolutely be correct.
But it does seem the capacity of a hybrid system of Netflix servers plus P2P would be strictly greater than either alone? It's not an XOR.
And note that in this case of "live" streaming, it still has a few seconds of buffer, which gives a bandwidth-delay product of a few MB. That's plenty to have non-stale blocks and do torrent-style sharing.
If switching to a peer causes increased buffering (which it will, because you still have to wait for the peer to download from Netflix) then you will still have the original problem Netflix is suffering from.
If the solution to users complaining about buffering is to build a system with more inherent buffering then you are back at square one.
I think it’s might be helpful to look at netlfix’s current system as already a distributed video delivery system in which they control the best seeds. Adding more seeds may help, but if Netflix is underprovisioned from the start you will have users who cannot access the streams
Yes, and then some idiot with an axe to grind against Logan Paul starts DDoSing people in the Netflix swarm, kicking them out of the livestream. This is always a problem because torrents, by design, are privacy-hostile. That's how the MAFIAA[1] figured out you were torrenting movies in 2004 and how they sent your ISP a takedown notice.
Hell, in the US, this setup might actually be illegal because of the VPPA[0]. The only reason why it's not illegal for the MAFIAA to catch you torrenting is because of a fun legal principle where criminals are not allowed to avail themselves of the law to protect their crimes. (i.e. you can't sue over a drug deal gone wrong)
[0] Video Privacy Protection Act, a privacy law passed which makes it illegal to ask video providers for a list of who watched what, specifically because a reporter went on a fishing expedition with video data.
[1] Music and Film Industry Association of America, a hypothetical merger of the MPAA and RIAA from a 2000s era satire article
Then, instead of people complaining about buffering issues, you'd get people complaining about how the greedy capitalists at Netflix made poor Joe Shmoe use all of his data cap, because they made him upload lots of data to other users and couldn't be bothered to do it themselves.
The way to deal with this is to constantly do live events, and actually build organizational muscle. Not these massive one off events in an area the tech team has no experience in.
We should always be doing (the thing we want to do)
Somme examples that always get me in trouble (or at least big heated conversations)
1. Always be building: It does not matter if code was not changed, or there has been no PRs or whatever, build it. Something in your org or infra has likely changed. My argument is "I would rather have a build failure on software that is already released, than software I need to release".
2. Always be releasing: As before it does not matter if nothing changed, push out a release. Stress the system and make it go through the motions. I can't tell you how many times I have seen things fail to deploy simply because they have not attempted to do so in some long period of time.
There are more just don't have time to go into them. The point is if "you did it, and need to do it again ever in the future, then you need to continuously do it"
Doing dry runs regularly makes sense, but whether actually shipping it makes sense seems context-dependent. It depends on how much you can minimize the side effects of shipping a release.
Consider publishing a new version of a library: you'd be bumping the version number all the time and invalidating caches, causing downstream rebuilds, for little reason. Or if clients are lazy about updating, any two clients would be unlikely to have the same version.
Or consider the case when shipping results in a software update: millions of customer client boxes wasting bandwidth downloading new releases and restarting for no reason.
Even for a web app, you are probably invalidating caches, resulting in slow page loads.
With enough work, you could probably minimize these side effects, so that releasing a new version that doesn't actually change anything is a non-event. But if you don't invalidate the caches, you're not really doing a full rebuild.
So it seems like there's a tension between doing more end-to-end testing and performance? Implementing a bunch of cache levels and then not using it seems counterproductive.
1) I want to invalidate caches, I want to know that these systems work. I want to know that my software properly handles this situation.
2) if I have lazy clients. I want to know. And I want to motivate them on updating sooner or figure out how to force update them. I don’t want to not update because some people are slow. I want the norm to be it is updating, so when there is a reason to update, like a zero day, I can have some notion that the updates will work and the lazy clients will not be an issue.
I am not talking about fake or dry runs that go through some portion of motions, I want every aspect of the process to be real.
Performance means nothing if your stuff is down. And any perceived performance gained by not doing proper hygiene is just tweaking the numbers to look better than they really are.
I think it often makes sense to do full releases frequently, but not continuously. For example, Chrome is on an approximately four week schedule, which makes sense for them. Other projects have faster cadences. There is a point of diminishing returns, though, and you seem to be ignoring the downsides.
It's very hard to do a representative dry run when the most likely potential points of failure are highly load-dependent.
You can try and predict everything that'll happen in production, but if you have nothing to extrapolate from, e.g. because this is your very first large live event, the chances of getting that right are almost zero.
And you can't easily import that knowledge either, because your system might have very different points of failure than the ones external experts might be used to.
They could have done a dry run. They could have spun up a million virtual machines somewhere, and tested their video delivery for 30 minutes. Even my small team spins up 10,000 EC2 instances on the regular. Netflix has the money to do much more. I'm sure there are a dozen ways they could have stress-tested this beforehand. It's not like someone sprang this on them last week and they had to scramble to put together a system to do it.
How representative is an EC2 instance in a datacenter simulating user behavior really, though?
These would likely have completely different network connectivity and usage patterns, especially if they don't have historical data distributions to draw from because this was their first big live event.
>How representative is an EC2 instance in a datacenter simulating user behavior really, though?
Systemic issues causing widespread buffering isn't "user behavior". It's a problem with how Netflix is trying to distribute video. Sure some connections aren't up to the task, and that isn't something Netflix can really control unless they are looking to improve how their player falls-back to lower bitrate video, which could also be tested.
>because this was their first big live event.
That's the point of testing. They should have already had a "big live event" that nobody paid for during automated testing. Instead they seem to have trusted that their very smart and very highly paid developers wouldn't embarrass them based on nothing more than expectations, but they failed. They could have done more rigorous "live" testing before rolling this out to the public.
1) You don't know if they did or did not do this kind of testing. I don't see any proof either way here. You're assuming they didn't.
2) You're assuming whatever issue happened would have been caught by testing on generic EC2 instances in AWS. In the end these streams were going to users on tons of different platforms in lots of different network environments, most of which look nothing like an EC2 instance. Maybe there was something weird with their networking stack on TCL Roku TVs that ended up making network connections reset rapidly chewing up a lot of network resources which led to other issues. What's the EC2 instance type API name for a 55" TCL Roku TV from six years ago on a congested 2.4GHz Wireless N link?
I don't know what happened in their errors. I do know I don't have enough information to say what tests they did or did not run.
I manage ~3000 customized websites based on the same template code. Sometimes we make changes to the template code that could affect the customizations - it is practically impossible to predict what might cause a problem due to the nature of the customizations. We'll take before and after screenshots of every page on every site, so it can get into the 100s of thousands of screenshots. We'll then run a diff on the screenshots to see what changed, reviewing the screenshots with the most significant changes. Then we'll address the problems we find and deploy the fixed release.
When we do these large screenshot operations, the EC2 instances are running for maybe 15 or 20 minutes total. It's not exactly cheap, but losing clients because we broke their site is something we want to avoid. The sites are hosted on a 3rd party service, and we're rate-limited by IP address, so to get this done in a reasonable amount of time we need to spin up 10,000 EC2 instances to distribute the work. We have our own software to manage the EC2 instances. It's honestly pretty simple, but effective.
Maybe they did. We don't know they did not. But problem is that real world traffic will still be always totally different, varied and dynamic in many unexpected ways and a certain link might be under certain effect causing a ripple effect.
I like all of these considerations, although I also imagine for every context there is some frequency at which it is worthwhile to invalidate the caches to ensure that all parts of the system are still functioning as expected (including the rebuilding of the caches).
What I’m seeing in large organizations is tracking dependencies within a team’s scope is better than dependencies between teams, because so many developers between teams are punting on tracking dependencies upon other teams’ artifacts if there isn’t a formal system already in place at the organization to establish contracts between teams along these dependency routes, that automatically handle the state and notifications when changes announcing the intended state change are put through the system. Usually some haphazard dependency representation is embedded into some software and developers call it a day, expecting the software to auto-magically solve a social-technical logistical problem instead of realizing state transitions of the dependencies are not represented and the software could never deliver what they assume.
Easy: Short term risk versus long term risk. If I deploy with minimal changes today, I'm taking a non-zero short-term risk for zero short-term gain.
While I too am generally a long-term sort of engineer, it's important to understand that this is a valid argument on its own terms, so you don't try to counter it with just "piffle, that's stupid". It's not stupid. It can be shortsighted, it leads to a slippery slope where every day you make that decision it is harder to release next time, and there's a lot of corpses at the bottom of that slope, but it isn't stupid. Sometimes it is even correct, for instance, if the system's getting deprecated away anyhow why take any risk?
And there is some opportunity cost, too. No matter how slick the release, it isn't ever free. Even if it's all 100% automated it's still going to barf sometimes and require attention that not making a new release would not have. You could be doing something else with that time.
In some environments, deploying to production has a massive bureaucracy tax. Paperwork, approvals, limited windows in time, can’t do them during normal business hours, etc.
Those taxes were often imposed because of past engineering errors. For example, Don't deploy during business hours because a past deployment took down production for a day.
A great engineering team will identify a tax they dislike and work to remove it. Using the same example, that means improving the success rate of deployments so you have the data (the success record) to take to leadership to change the policy and remove the tax.
The counterargument is obvious for anyone who has been on call or otherwise responsible for system stability. It's very easy to become risk-averse in any realm.
Yes. Because it lowers the chance compound risk. The longer you go without stressing the system the more likely you will have a double failure, thus increasing your outage duration.
Simply put. You don’t want to delay funding out something is broke, you want to know the second it is broken.
The the case I am suggesting, a failed release will be often deploying the same functionality, thus many failure modes will result in zero outage. It all failure modes will result in an outage.
When the software is expected to behave differently after the deployment, more systems can result in being part of the outage. Such as the new systems can’t do something or the old systems can’t do something.
Not exactly, but it's worth the experiment in trying things anyway. Say you currently have a release once every few months, an ambitious goal would be to get to weekly releases. Continuous enough by comparison. But 'lower risk' is probably not the leading argument for the change, especially if the quarterly cycle has worked well enough, and the transition itself increases risk for a while. In order for a committed attempt to not devolve into a total dumpster fire, various other practices will need to be added, removed, or changed. (For example, devs might learn the concept of feature flags.) The individuals, which include management, might not be able to pull it off.
Deploying is expensive for some models. That could involve customer facing written release notes, etc. Sometimes the software has to be certified by a govt authority.
Additionally, refactor circle jerks are terrible for back-porting subsequent bug fixes that need to be cherry picked to stable branches.
A lot of of the world isn’t CD and constant releases are super expensive.
"Test what you fly, and fly what you test" (Supposedly from aviation)
"There should be one joint, and it should be greased regularly" (Referring to cryptosystems I think, but it's the same principle. Things like TLS will ossify if they aren't exercised. QUIC has provisions to prevent this.)
> 1. Always be building: It does not matter if code was not changed...
> 2. Always be releasing...
A good argument for this is security. Whatever libraries/dependencies you have, unpin the versions, and have good unit tests. Security vulnerabilities that are getting fixed upstream must be released. You cannot fix and remove those vulnerabilities unless you are doing regular releases. This in turn also implies having good unit tests, so you can do these builds and releases with a lower probability of releasing something broken. It also implies strong monitoring and metrics, so you can be the first to know when something breaks.
> Whatever libraries/dependencies you have, unpin the versions, and have good unit tests.
Nitpick: unit tests by definition should not be exercising dependencies outside the unit boundary. What you want are solid integration and system tests for that.
Unless the upstream dependency happens to maintain stable branches, constantly pulling in the latest branches increases your risk of vulnerabilities more than getting the discovered bug patches
There should be a caveat that such this kind of decision should be based on experience and not treated as a rule that juniors might blindly follow. We all know how "fail fast and early" turned out (or whatever the exact phrase was).
They've been doing live events since 2023. But it's hard to be prepared for something that's never been done by anyone before — a superbowl scale event, entirely viewed over the internet. The superbowl gets to offload to cable and over the air. Interestingly, I didn't have any problems with my stream. So it sounds like the bandwidth problems might be localized, perhaps by data center or ISP.
I suspect a lot of it could be related to ISP bandwidth. I streamed it on my phone without issue. Another friend put their TV on their phone’s WiFi which also worked. Could be partly that phone hotspots lower video bandwidth by default.
I suspect it’s a bit of both Netflix issues and ISPs over subscribing bandwidth.
My suspicion is the same as yours, that this may have been caused by local ISPs being overwhelmed, but it could be a million other things too. I had network issues. I live in a heavily populated suburban area. I have family who live 1000+ miles away in a slightly less populated suburban area, they had no issues at all.
I would guess the majority of the streamed bandwidth was sourced from boxes like these in ISP's points of presences around the globe: https://openconnect.netflix.com/en/
So I agree the problems could have been localized to unique (region, ISP) combinations.
The ISP hypothesis doesn't make sense to me. I could not stream the live event from Netflix. But I could watch any other show on netflix or youtube or hulu at the same time.
Some ISPs have on-site Netflix Open Connect racks. The advantage of this is that they get a high-priority quality of service data stream into the rack, which then serves the cached content to the ISP customers. If your ISP doesn't have a big enough Netflix rack and it gets saturated, then you're getting your streams at the whim of congestion on the open internet. A live stream is a few seconds of video downloaded, and it has to make it over the congestion of the internet in a few seconds and then repeat. If a single one of these repeats hits congestion and gets delayed, you see the buffering spinning wheel.
Other shows, on the other hand, can show the cached Netflix splash animation for 10 seconds while they request 20 minutes of cache until they get it. So, dropped packets don't matter much. Even if the internet is seeing congestion every couple of minutes, delaying your packets, it won't matter as non-live content is very flexible and patient about when it receives the next 20-minute chunk. I'm not an ISP or Netflix engineer, so don't take these as exact numbers. I'm just explaining how the "bandwidth problems might be localized" hypothesis can make sense from my general understanding.
Yeah, I think people are incorrectly assuming that everyone had the same experience with the stream. I watched the whole thing and only had a few instances of buffering and quality degradation. Not more than 30 seconds total during the stream.
Even if it was only 30% of people had a problem that's still millions of unhappy users. Not great for a time sensitive event.
Also, from lurking in various threads on the topic Netflix's in app messages added to people's irritation by suggesting that they check their WiFi/internet was working. Presumably that's the default error message but perhaps that could have been adjusted in advance somehow.
I had issues here and there but there was workarounds. Then, towards the end, the quality either auto negotiated or was forced down to accommodate the massive pull.
Unless Netflix eng decides to release a public postmorterm, we can only speculate. In my time organizing small-time live streams, we always had up to 3 parallel "backup" streams (Vimeo, Cloudflare, Livestream). At Netflix's scale, I doubt they could simply summon any of these providers in, but I guess Akamai / Cloudflare would have been up for it.
Sometimes this just isn't feasible for cost reasons.
A company I used to work for ran a few Super Bowl ads. The level of traffic you get during a Super Bowl ad is immense, and it all comes at you in 30 seconds, before going back to a steady-state value just as quickly. The scale pattern is like nothing else I've ever seen.
Super Bowl ads famously seven million dollars. These are things we simply can't repeat year over year, even if we believed it'd generate the same bump in recognition each time.
I think Netflix have a fair bit of organisational muscle, perhaps the fight was considered not as large of an event as the NFL streams would be in the future.
Also, "No experience in" really? You have no idea if that's really the case
Everyone here talking like this something unique netflix had to deal with. Hotstar live streamed india va Pakistan cricket match with zero issues with all time high live viewership ever in the history of live telecast. Why would viewers paying $20 month want to think about their technical issues, they dropped the ball pure and simple. Tech already exists for this, it’s been done before even by espn, nothing new here.
Do people even have cable TV anymore? I have internet from my "cable" company but I don't have the "cable" connected to anything but the modem. Everything I watch is streamed. The only thing connected to my TV is a Roku.
I see 68.7 million people, not households. There's my 6 seconds.
Maybe 10 minutes would give me a better truth.
>You make the tech bubble mistake of believing that high speed internet is as ubiquitous as coax.
Yes, and no. Given that the top US cities contain about 8% of the population, you can cover a surprising amount of large country with a surprisingly small amount of area coverage. So it's not as straightforward as "people in SF are in a bubble".
I guess my question is: why? "Cable boxes" are uniformly awful to use in my experience. The UI is clunky, they take up space and it's another remote and another tangle of wires to try to hide. What advantage do they offer in 2024?
Don’t act so surprised—-streaming is a pain in the ass to figure out. People have been trained to tolerate a 3-second UI lag for every button press (seemingly all cable boxes are godawfully shitty like this—-it must be the server-side UI rendering design?)
BUT! You can record your game and the cable TV DVR is dead reliable and with high quality. There is no fear of competing for Wi-Fi bandwidth with your apartment or driveway neighbors, and the DVR still works even if cable is out. And as long as you haven’t deleted the recording it won’t go away for some stupid f’ing reason.
Finally, the cable TV DVR will let you fast forward through commercials—-or you can pause live TV to break for bathroom and make a snack, so you can build up a little buffer, now you are fast forwarding commercials on nearly-live TV. You can’t fast forward commercials with most mainstream streaming anymore. Who broadcasts your big games? Big players like Paramount+ won’t let you skip commercials anymore. The experience is now arguably worse. Once you settle in, forward 30sec back 30sec buttons work rather smoothly (that’s one part of cable TV boxes that has sub-half-second latency).
Your concern about extra remotes and extra boxes and hiding wires is a vanity most don’t care about. They are grateful for how compact big-screen TVs are these days compared to the CRTs or projection TVs of the past. They probably have their kids’ game console and a DVD/BluRay player on the same TV stand anyway.
Apparently movies purchased on Roku are now on Vudu. I hope that people who bought movies on Roku were able to figure it out. This is how technology sucks. Movies purchased with my cable provider’s Video On Demand are still with me, slow as shit as navigating to them is.
I last regularly used a DirecTV DVR. There were a surprising number of times where it wouldn't let me fast forward through ads. Not only that, sometimes it would connect out to the internet to download new targeted forced ads on stuff that was recorded a while ago.
You have access to all the shows from the major networks. You don’t need to subscribe to Peacock and Paramount and Hulu and the TBS app and Discovery+ and…
Better yet, they’re all combined in one interface as opposed to all trying to be the only thing that you use.
Also, especially if you grew up with it, there is absolutely a simplicity in linear TV. Everyone was used to a DVR. And yeah the interface sucks, but it sucked for everyone already anyway so they’re used to it. Don’t know what you wanna watch? Turn on a channel you watch and just see what’s on. No looking at 400 things to pick between.
I’ve seen people switch off and have serious trouble because it’s such a different way of watching TV from what they were used to. They end up using something like Hulu Live or YouTube TV to try and get the experience they’re used to back.
This. I’m exactly in this YouTube tv camp and most the time just miss the simplicity of the old cable. Having to find things to watch is for me and awful experience. Then when I do want to watch something trying to figure out which app it’s actually on is awful. I think we subscribed to a dozen different things, it’s so damn fragmented. Even in early days if Netflix, I was a holdout that kept going to blockbuster because the UI of visually scanning a wall/shelf of DVDs was far superior to the Netflix version of the same IMO.
This is definitely turning into my version of an old man rant. “Back in my day…” the main benefit of it all is I actually just don’t watch as much as I once did. The friction is too high. Or, the commitment is too high-I dont usually want to jump into some 10 episode series.
Well I haven’t gone back to linear TV, I totally get it.
I don’t subscribe to anything that doesn’t work with my Apple TV. Netflix for example won’t integrate with it the way Hulu does. So whatever show I’m watching on Netflix? Wouldn’t show up in my show list on my Apple TV. I forget it exists.
So I don’t subscribe to it. Or anything else like that. You are NOT more important than me, service I pay for.
The only two exceptions are YouTube (which obviously works differently) and Plex for the few things that I already already owned on DVD or can’t get on any service.
It works well enough for me. But I still find myself missing a linear TV now and then.
I've certainly listened to some fascinating documentaries on BBC Radio 4 on subjects which it would never have occurred to me to seek out. There's definitely some advantages to linear broadcast.
I don't have TV but I watched the Euro football team matches at my mom's because guess what watching sport streams at 480p is no fun- and it frequently breaks because the internet wasn't meant for live broadcasting to a large audience.
From my experience? The ability to punch in a channel number (or not even that) and get something playing, instantly, without the need to make a choice.
For many people, often those with backgrounds that make them unlikely to frequent HN, the experience they're looking for is "1. get home, 2. open beer, 3. turn TV on, 4. watch."
The default state of a streaming app is to ask you what you want to watch, and then show you exactly the thing you selected. The default state of traditional TV is to show you something, and let you switch to something else if you can't stand the thing you're watching right now or have something specific in mind. Surprisingly, many people prefer the latter over the former.
The same applies to radio versus streaming, many family members of mine don't use streaming, because all it takes to turn on the radio is turning the key in the ignition, which they have to do anyway.
"The festivities were streamed live from the Peacock Theater in L.A. across more than 30 platforms including YouTube, Twitch, Facebook, TikTok Live, X (Twitter), Steam, WeChat, Bilibili, Huya, DouYu, Xiaohongshu and Instagram Live"
But that's exactly the point: Netflix didn't do this in a vacuum, they did it within Netflix.
It might just have been easier to start from scratch, maybe using an external partner experienced in live streaming, but the chances of that decision happening in a tech-heavy company such as Netflix that seems to pride itself on being an industry leader are close to zero.
depending on whom you ask, the bitrate used by the stream is significantly lower than what is considered acceptable from free livestreaming services, that albeit stream to much, much smaller audience.
without splitting hairs, livestreaming was never their forte, and going live with degradation elsewhere is not a great look for our distributed computing champ.
Netflix is good only on streaming ready made content, not live streaming, but;
1. Netflix is a 300B company, this isn't a resources issue.
2. This isn't the first time they have done live streaming at this scale either. They already have prior failure experience, you expect the 2nd time to be better, if not perfect.
3. There were plenty of time between first massive live streaming to second. Meaning plenty of time to learn and iterate.
The problem is that provisioning vast capacity for peak viewership is expensive and requires long-term commitment. Some providers won't give you more connectivity to their network unless you sign a 12 month deal where you prepay that.
Peak traffic is very expensive to run, because you're building capacity that will be empty/unsused when the event ends. Who'd pay for that? That's why it's tricky and that's why Akamai charges these insane prices for live streaming.
A "public" secret in that network layer is usually not redundant in your datacenter even if it's promised. To have redundant network you'd need to double your investment and it'll seat idle of at 50% max capacity. For 2hr downtime per year when you restart the high-capacity routers it's not cost efficient for most clients.
Then sign a contract with Akamai, who has been in business for 25 years? You outsource if you aren’t planning to do something very often.
There is no middle ground where you commit a mediocre amount of resources, end up with downtime and a mediocre experience, and then go “but we saved money.”
When Apple moved off Akamai for their Keynote live streaming, ( I remember they also used Limestone or EdgeCast ) they had some percentage of audience using Akamai and some on their own CDN. I think it took them three years before they completely moved off Akamai. Not sure if that is still case as that was more than 10 years ago.
But like you stated, they dont want to spend money and their technical people couldn't deliver on time. This isn't a technical issue a lot of people on HN and Twitter wants to discuss about. It is a management issue.
What's your point? If they couldn't manage to secure the resources necessary, they shouldn't have agreed to livestream it. As a customer, I don't care AT ALL if it's difficult.
As a customer you're right that you don't care. As a company you care for the cost of the product you sell.
Companies doesn't care about "the customer" per se, they care about their profit margin and "free market" competition pushes them to lower the price and keep the service level good.
Exclusive rights to stream the fight? Well... it better be working, but we won't overprovision and pay over the expected return.
Buying the exclusive rights for 20Mil and puting 30Mil to stream it wouldn't be a very smart choice. Fuckups happen and this might be a mistake that cost them in lost reputation more than they expected to win.
They have the NFL next month on Christmas day. So that'll be a big streaming session but I think it'll be nothing compared to this. Even Twitter was having problems handling the live pirate streams there.
Apple was clearly larger than Google when they came out with Apple Maps, and it was issue-laden for a long time. It is not a resource-issue, but a tech development maturity issue.
You can't solve your way out of a complex problem that you have created and which wasn't needed in the first place. The entire microservices thing was overly complex with zero benefits
I spoke to multiple Netflix senior technicians about this.
That's a ridiculous statement. PrimeVideo is the leader in terms of sports events streaming over internet and it is composed of hundreds of microservices.
Live streaming is just much harder than streaming, and it takes a years of work and a huge headcount to get something good.
It was a single team for a very specific use case.
To be clear when I said that PrimeVideo is composed of hundreds of microservices, I actually meant that it's composed of hundreds of services, themselves composed, more often than not, of multiple microservices.
Depending on your definition of a microservice, my team alone owns dozens.
This comment shows how a very random blog about a very small part of a product can dominate all conversation about it. Prime video famously did not undo anything. Out of 100+ teams one team undid one service. But somehow similar comments are common on HN. I am making no judgement or microservice or not just on this particular comment.
People just do not appreciate how many gotchas can pop up doing anything live. Sure, Netflix might have a great CDN that works great for their canned content and I could see how they might have assumed that's the hardest part.
Live has changed over the years from large satellite dishes beaming to a geosat and back down to the broadcast center($$$$$), to microwave to a more local broadcast center($$$$), to running dedicated fiber long haul back to a broadcast center($$$), to having a kit with multiple cell providers pushing a signal back to a broadcast center($$), to having a direct internet connection to a server accepting a live http stream($).
I'd be curious to know what their live plan was and what their redundant plan was.
Sorry for the off topic but what’s this thing that I only come across in Hacker News about referring to a company by their stock exchange name (APPL, MSFT, etc) outside of a stock context? It seems really weird to me.
In-group signaling for people who like playing or thinking about the stock market. Similar to how people who make travel a big part of their identity refer to cities by their airport code.
I have always assumed that a focus on stock tickers is the natural result when your primary user base is a group of people hyper focused on “total compensation” and stock grants. The name hackernews is merely a playful reference to the history of the site. Like the name “Patriot Act.”
As a counterpoint to some other replies, I do this sometimes, not thinking at all about stocks but instead as a standardized abbreviation of sorts. Ms for example can mean tons of things from a title to multiple sclerosis to milliseconds. MSFT is clear and half the length.
I used to gather fringe signals for shitty hedge funds, “fintech” from public communities such as HN.
I think I subconsciously adopted it since it made my job easier. Sort of how I use YYYYMMDD format in almost everything from programming to daily communication.
Technically it’s one fewer keystroke (and it’s AAPL).
It’s a lot fewer keystrokes for MS (Morgan Stanley), GS (Goldman Sachs) and MSFT (Microsoft) than it is for AAPL, but it’s a force of habit for some. Once you’re used to referring to firms by their ticker symbols, you do it all the time.
E.g. an ex trader friend still says “spot” instead of “point” when referring to decimal points, even if talking in other contexts like software versions.
Tangent: Not fewer like F (Ford) or H (Hyatt Hotels). Unfortunately we don't have a full alphabet, missing {I, N, P, Q, Y}.
I guess if we allowed the first two letter ticker symbol for the missing singles, we could send messages by mentioning a bunch of company names.
Eg "Buy: Dominion Energy, Agilent. Hold: Nano Labs. Sell: Genpact." would refer to our esteemed moderator, and "Hyatt Hotels is pleased to announce special corporate rates for Nano Labs bookings" to this site itself.
[maybe it would be better to use the companies where the corporate name and ticker letter don't match? Like US Steel for X and AT&T for T?]
AAPL is only fewer keystrokes than Apple if you’re in a physical keyboard and hold the shift key, which makes it hardly more convenient. If you use caps lock presumably you’ll press it again.
On a phone, at least in iOS, you have to double tap the shift key.
Not sure if you’re joking, but if not: there’s no practical difference is what is being asked of the reader. Nobody predicates any decision on whether they were asked to note something vs note it well.
You probably are a major investor, incidentally, through your pension fund(s) and retirement savings. It is hard to avoid Netflix if you are doing any kind of broad-based index investing.
I was pointing out how dumb a multibillion dollar company is for getting this so wrong. Broadcasting live events is something that is underestimated by everyone that has never it, yet hubris of a major tech company thinking it knows better is biting them in the ass.
As many other people have commented, so many other very large dwarfing this event have been pulled off with no hiccups visible to the viewers. I have amazing stories of major hiccups during MLB World Series that viewers had no idea about happening, but “inside baseball” people knew. To the point that the head of the network caught something during the broadcast calling the director in the truck saying someone is either going to be fired or get a raise yet the audience would never have noticed if the person ended up getting fired. They didn’t, btw.
This is the whole point of chaos engineering that was invented at Netflix, which tests the resiliency of these systems.
I guess we now know the limits of what "at scale" is for Netflix's live-streaming solution. They shouldn't be failing at scale on a huge stage like this.
I look forward to reading the post mortem about this.
Everyone keeps mentioning at scale. I seriously doubt this was an "at scale" problem. I have strong suspicion this was a failure at the origination point being able to push a stable signal. That is not an "at scale" issue, but a hubris of we can do better/cheaper than broadcasting standard practices
As counterpoint, I observed 2-3 drops in bitrate, but an otherwise fine experience. So the problem seems to have been in dissemination, not at the origin.
Yeah, I was switching between my phone and desktop to watch the stream and I had a seamless experience on both devices the entire time. I’m not sure why so many people are assuming this was a universal experience.
I highly doubt this. Netflix has a system of OCAs that are loaded with hard disks, are installed in ISP’s networks, and serve the majority of those ISP’s customers.
Given than many people had no problems with the stream, it is unlikely to have been an origin problem but more likely the mechanism to fanout quickly to OCAs. Normally latency to an OCA doesn’t matter when you’re replicating new catalogs in advance, but live streaming makes a bunch of code that previously “didn’t need to be fast” get promoted to the hot path.
I am not sure that it is an issue with the origination point. In fact I just thought it was my ISP because my daughter's boyfriend was watching and doing facetime with her and my video was dropping but his was not. I have 2gb fiber and we regularly stream five TVs without any issue, so it should not have been a bandwidth issue.
I've tried to watch an old Seinfeld episode during this event. It was freezing every few minutes even at downgraded bitrate. A video that should be on my local CDN node.
If commercial = public, then no - you can not use multicast for this. It is heavily used within some enterprise networks though like if you go to a gym with lots of TVs they are all likely on multicast
Do you live stream the superbowl? Me and everyone I know watch it over antenna broadcast tv. I think it is easier to have millions of tvs catch airwaves vs millions of point to point https video streams.
If you watch it over cable, you're live streaming it. Let's face it, that's where the vast majority of viewers see it. Few people view OTA even if the quality is better.
Live sports do not broadcast the event directly to a streamer. They push it to their broadcast centers. It then gets distributed from there to whatever avenues it needs to go. Trying to push a live IP stream directly from the remote live venue rarely works as expected. That's precisely why the broadcasters/networks do not do it that way
> If you watch it over cable, you're live streaming it.
Those are multicast feeds.
> Trying to push a live IP stream directly from the remote live venue rarely works as expected.
In my experience it almost always works as expected. We have highly specialized codecs and equipment for this. The stream is actively managed with feedback from the receiver so parameters can be adjusted for best performance on the fly. Redundant connections and multiple backhauls are all handled automatically.
> That's precisely why the broadcasters/networks do not do it that way
We use fixed point links and satellite where possible because we own the whole pipe. It's less coordination and effort to setup and you can hit venues and remotes where fixed infrastructure is difficult or impossible to install.
I chose to interpret it charitably and assume OP was saying it's not pushed from venue direct to viewer.
> We use fixed point links and satellite where possible because we own the whole pipe.
Over long distance I get better reliability out of a decent internet provision than in many fixed point to point links, and certainly when comparing at a price point. The downside of the internet is you can't guarantee path separation - even if today you're routing via two different paths, tomorrow the routes might change and you end up with everything going via the same data centre or even same cable.
> If you watch it over cable, you're live streaming it.
Which is probably done over the cableco's private network (not the public Internet) with a special VLAN used for television (as opposed to general web access). They're probably using multicast.
Is cable video over IP now? Last time I looked (which was forever ago), even switched video was atsc with a bit of messaging for the cable box to ask what channel to tune to, and to keep the stream alive. TV over teleco systems seems to be highly multicast, so kind of similar, headend only has to send the content once, in a single bitrate.
Not really the same as an IP service live stream where the distribution point is sending out one copy per viewer and participating in bitrate adaptation.
AFAIK, Netflix hasn't publicly described how they do live events, but I think it's safe to assume they have some amount of onsite production that outputs the master feed for archiving and live transcoding for the different bitrate targets (that part may be onsite, or at a broadcast center or something cloudy), and then goes to a distribution network. I'd imagine their broadcast center/or onsite processing feeds to a limited number of highly connected nodes that feed to most of their CDN nodes; maybe more layers. And then clients stream from the CDN nodes. Nobody would stream an event like this direct from the event; you've got to have something to increase capacity.
Over the US and Canada it mostly is, though how advanced the transition is is very regional.
The plan is to drop both analog signal and digital (QAM) to reclaim the frequencies and use them for DOCSIS internet.
Newer set top boxes from Comcast (xfinity) runs over the internet connection (in a tagged VLAN on a private network, and they communicate over a hidden wifi).
When Netflix started it was the first in the space and breaking ground which is how they became a "tech" company that happens to stream media however it has been 15 years and since than the cloud providers have basically build "netflix as a service". I suspect most of the big streamers are using that instead of building their own in house thing and going through all the growing pains netflix is.
What are you talking about? The signal coming from a live event is the full package. The output of “the truck” has multiple outs including the full mix of all grafix, some only have the mix minus any branding, etc. While the isos get recorded in the truck, they are not pushed out to the broadcast center.
All of the “mixing” as you call it is done in the truck. If you’ve never seen it, it is quite impressive. In one part of the truck is the director and the technical director. The director is the one calling things like “ready camera 1”, “take 1”, etc. the TD is the one on the switcher pushing the actual buttons on the console to make it happen. Next to them is the graphics team prepping all of the stats made available to the TD to key in. In another area is the team of slomo/replay that are taking the feeds from all of the cameras to recorders that allow the operators to pull out the selects and make available for the director/TD to cut to. Typically in the back of the truck is the audio mixer that mixes all of the mics around the event in real time. All of that creates the signal you see on your screen. It leaves the back of the truck and heads out to wherever the broadcaster has better control
Not nowadays, more and more remote production for larger and larger events, and it’s coming on rapidly. Directors are increasing sitting in centralised control rooms rather than in a scanner.
BT sport are interesting, spin up graphics, replay, etc in an AWS environment a couple of hours before. I was impressed by their uefa youth league coverage a couple of years ago, and they aren’t slowing down
Obviously not every broadcast, or even most, are remote now, but it’s an ever increasing number.
I don’t know how the US industry works, I suspect the heavy union presence I’ve seen at places like NAB will slow it, but in Europe remote production is increasingly the future.
> People just do not appreciate how many gotchas can pop up doing anything live.
Sure thing, but also, how much resources do you think Netflix threw on this event? If organizations like FOSSDEM and CCC can do live events (although with way smaller viewership) across the globe without major hiccups on (relatively) tiny budgets and smaller infrastructure overall, how could Netflix not?
This is true, but scale comes after production. Once you have the video encoded on a server with a stable connection the hard part is over. What netflix failed to do is spread the files to enough servers around the globe to handle the load. I'm surprised they were unable(?) to use their network of edge servers to handle the live stream. Just run the stream with a 10 second delay and in that time push the stream segments to the edge server
This right here is where I'd expect the failure to occur. This isn't Joey Beercan running OBS using their home internet connectivity.
This is a major broadcast. I'd expect a full on broadcast truck/trailer. If they were attempting to broadcast this with the ($) option directly to a server from onsite, then I would demand my money back. Broadcasting a live IP signal just falls on its face so many times it's only the cheap bastard option. Get the video signal as a video signal away from the live location to a facility with stable redundant networking.
This is the kind of thinking someone only familiar with computers/software/networking would think of rather than someone in broadcasting. It's nice to think about disrupting, but this is the kind of failure that disruptors never think about. Broadcasters have been there done that with ensuring live broadcasts don't go down because an internet connection wasn't able to keep up.
I’ve been using vyvx since it was called global crossing/genesis, it was fairly unique when it started, but point to point ip distributon of programs has been the norm for at least 15 years. Still have backup paths on major events on a different technology, you’d be surprised how common a dual failure on two paths can be. For example output from the euro football this summer my mai paths were on a couple of leased lines with -7, but still had a backup on some local internet into a different city just incase there was a meltdown of the main providers network (it’s happened before with ipath, automation is great until it isn’t)
The CCC video crew has its fair share of geeks from broadcasting corporations and studio houses. Their combined institutional knowledge about live events and streaming distribution is probably in the same ballpark as that of giant global TV networks.
They also have the benefit of having practiced their craft at the CCC events for more than a decade. Twice a year. (Their summer event is smaller but still fairly well known. Links to talks show up on HN every now and then.)
Funky anecdote: the video crew at Assembly have more broadcasting and live AV gear for their annual event than most medium-sized studios.
> how much resources do you think Netflix threw on this event?
Based on the results, I hope it was a small team working 20% time on the idea. If you tell me they threw everything they had at it to this result, then that's even more embarrassing for them.
It wasn't even just buffering issues, the feed would just stop and never start again until I paused it and then clicked "watch live" with the remote.
It was really bad. My Dad has always been a fan of boxing so I came over to watch the whole thing with him.
He has his giant inflatable screen and a projector that we hooked up in the front lawn to watch it, But everything kept buffering. We figured it was the Wi-Fi so he packed everything up and went inside only to find the same thing happening on ethernet.
He was really looking forward to watching it on the projector and Netflix disappointed him.
Commercial boxing has always been like WWE or MMA with a thin veneer of actual sport to it, i.e. it is just entertainment[1].
To rephrase your question then what does someone think of the entertainment on display?
I don't think it was good entertainment.
None of the hallmarks of a good show was present. i.e. It wasn't close, nor was it bloody or anything unexpected like say a KO everything went pretty much as expected. It wasn't nice watch as all,no skill or talent was on disply, all Paul had to do was use his speed to backpedal from the slow weak punches of a visibly older tyson with a bum knee and land some points occasionally to win.
--
[1] There is a deeper argument here is any spectator sports just entertainment or is truly about skill talent and competition. Boxing however including the ones promoted by traditional four major associations falls clearly on the entertainment side than say another sport like NFL to me.
Was this necessary? The comment was on a tech forum about the tech issues, do we really need to reprosecute the argument that it wasn’t real boxing here too? There are plenty of other places for those so painfully inclined to do so
Cable TV (or even OTA antenna in the right service area) is simply a superior live product compared to anything streaming.
The Masters app is the only thing that comes close imo.
Cable TV + DVR + high speed internet for torrenting is still an unmatched entertainment setup. Streaming landscape is a mess.
It's too bad the cable companies abused their position and lost any market goodwill. Copper connection direct to every home in America is a huge advantage to have fumbled.
The interesting thing is that a lot of TV infrastructure is now running over IP networks. If I were to order a TV connection for my home I'd get an IPTV box to connect to my broadband router via Ethernet, and it'd simply tell the upstream router to send a copy of a multicast stream my way.
Reliable and redundant multicast streaming is pretty much a solved problem, but it does require everyone along the way to participate. Not a problem if you're an ISP offering TV, definitely a problem if you're Netflix trying to convince every single provider to set it up for some one-off boxing match.
This. Im honestly going to cancel my streaming shit. They remove and mess with it so much. Like right now HBO max or whatever removes my recent watches after 90 days. why?
On a few forum sites I'm on, people are just giving up. Looking forward to the post-mortem on how they weren't ready for this (with just a tiny bit of schadenfreude because they've interviewed and rejected me twice).
AB84 streamed it live from a box at the arena to ~5M viewers on Twitter. I was watching it on Netflix, I didn't have any problems, but I also put his live stream up for the hell of it. He didn't have any issues that I saw.
It’s not everyone. Works fine for me though I did have to reload the page when I skipped past the woman match to the Barrios Ramos fight and it was stuck buffering at 99%.
I wonder if there will be any long term reputational repercussions for Netflix because of this. Amongst SWEs, Netflix is known for hiring the best people and their streaming service normally seems very solid. Other streaming services have definitely caught up a bit and are much more reliable then in the early days, but my impression still has always been that Netflix is a step above the rest technically.
This sure doesn't help with that impression, and it hasn't just been a momentary glitch but hours of instability. And the Netflix status page saying "Netflix is up! We are not currently experiencing an interruption to our streaming service." doesn't help either...
Not the same demographic but their last large attempt at live was through a Love is blind reunion. It was the same thing, millions of people logging in, epic failure, nothing worked.
They never tried to do a live reunion again. I suppose they should have to get the experience. Because they are hitting the same problems with a much bigger stake event.
yup wanted to say that live stream stuttering has happened before on Netflix - I don't think the reputation is deserved.
From a livestreaming standpoint, netflix is 0/x - for many large events such as love is blind, etc.
From a livestreaming standpoint, look to broadcast news, sports / Olympics broadcasters, etc and you'll see technology, equipment, bandwidth, planning, and professionalism at 1000x of netflix.
Heck, for publicly traded quarterly earnings livestream meetings, they book direct satellite time in addition to fiber to make sure they don't rely only on terrestrial networks which can fail. From a business standpoint, failure during a quarterly meeting stream can mean the destruction of a company (by making shareholders mad that they can't see and vote during the meeting making them push for internal change) - so the stakes are much higher than live entertainment streaming.
Netflix is good at many things, livestreaming is not one of those things.
for livestreams, individual events like the Olympics probably has a surge audience of 10x of netflix events.
Netflix events is small potatoes compared to other livestream stalwarts.
Imagine having to stream a cricket match internationally to UK / India / Australia with combined audience that crushes the Superbowl or a football match to all of Europe, or even something like livestreaming F1 racing that has multiple magnitudes of audience than a boxing match and also has 10x the number of cameras (at 8K+ resolution) across a large physical staging arena (the size of the track/course) in realtime, in addition to streaming directly from the cockpit of cars that are racing 200mph++.
Livestream focused outfits do this all day, everyday.
Netflix doesn't even come close to scratching the "beginner" level of these kinds of live events.
It's a matter of competencies. We wouldn't expect Netflix to be able to serve burgers like McDonald's does - Livestreaming is a completely different discipline and it's hubris on Netflix's part to assume just because they're good at sending video across the internet they can competently do livestreaming.
the point i’m making is that the netflix live streaming timeline didn’t go
chris rock -> love is blind -> mike tyson
they have had other, successful executions in between. the comment i was replying to had cherry picked failures and i’m trying to git rebase them onto main.
From what I've heard, Netflix has really diluted the culture that people know of from the Patty McCord days.
In particular, they have been revising their compensation structure to issue RSUs, add in a bunch of annoying review process, add in a bunch of leveling and titles, begin hiring down market (e.g. non-sr employees), etc.
In addition to doing this, shuffling headcount, budgets, and title quotas around has in general made the company a lot more bureaucratic.
I think, as streaming matured as a solution space, this (what is equivalent to cost-cutting) was inevitable.
If Netflix was running the same team/culture as it was 10 years ago, I'd like to say that they would have been able to pull of streaming.
Combination of 2 and 3. The business changed. Streaming was more or less a solved problem for Netflix. They needed money for content, not expensive engineers. Ted is co-ceo… you can see where the priority is.
So the issue is that Netflix gets its performance from colocating caches of movies in ISP datacenters, and a live broadcast doesn't work with that. It's not just about the sheer numbers of viewers, it's that a live model totally undermines their entire infrastructure advantage.
Correct, this is not Netflix’ regular cup of tea, and it’s a very different problem to solve. They can probably use their edge caches, but it’s challenging.
My wild assed guess is the differences in the edge nodes.
Netflix's edge nodes are optimized for streaming already encoded videos to end users. They have to transcode some number of formats from the source and send them all to the edge nodes to flow out. It's harder to manage a ton of different streams flowing out to the edge nodes cleanly.
I would guess YouTube, being built on google's infrastructure , has powerful enough edge nodes that they stream one video stream to each edge location and the edges transcode for the clients. Only one stream from source to edge to worry about and is much simpler to support and reason about.
> I would guess YouTube, being built on google's infrastructure , has powerful enough edge nodes that they stream one video stream to each edge location and the edges transcode for the clients.
Ha, no, our edge nodes don't have anywhere near enough spare CPU to do transcoding on the fly.
We have our own issues with livestreaming, but our system's developed differently over the past 15 years compared to Netflix's. While they've historically focused on intelligent pre-placement of data (which of course doesn't work for livestreaming), such an approach was never feasible for YT with the sheer size of our catalog (thanks to user-generated content).
Netflix is still new to the space, and there isn't a good substitute for real-world experience for understanding how your systems behave under wildly different traffic patterns. Give them some time.
It also helps that youtube serves shit tier quality videos more gracefully. Everyone is used to the step down to pixel-world on youtube to the point where they don’t complain much.
And decent part of these users are on free tier, so they are not paying for it. That alone gives you some level of forgiveness. At least I am not paying anything for this experience.
Live streams have different buffering logic to video on demand. Customers watching sports will get very upset if there is a long buffer, but for a VOD playback you don't care how big the buffer is. Segment sizes are short for live and long for VOD because you need to adapt faster and keep buffers small for Live, but longer download segments are better for buffering.
In my experience even YouTubeTV has problems sometimes. I'll have the 1080p (and enhanced mode also I think) quality set and still deal with a lot of compression artifacts.
Not sure how Netflix does it. But this is not very time sensitive, and I would have delayed the stream by 15 to 30 seconds to cache it and then deliver to everyone.
Not sure I fully buy that. The “live” stream is rarely “live”. It’s often a highly cached buffer that’s a few mins from latest. Those in isp caches can still help here.
Yep. Having actually worked on this sort of stuff I can confirm.
Your ISP doesn't have enough bandwidth to the Internet (generally speaking) for all users to get their feed directly from a central location. And that central location doesn't have enough bandwidth to serve all users even if the ISP could. That said, the delay can be pretty small, e.g. the first user to hit the cache goes upstream, the others basically get the stream as it comes in to the cache. This doesn't make things worse, it makes them better.
I don't bet so I have no clue, but why is that? Are people able to place bets in the middle of the match or something? I would have assumed bets get locked in when the fight starts
This is kind of silly because the delay between actual event happening to showing up on OTA TV or cable TV to showing up on satellite TV can already be tens of seconds.
Or, hear me out here, it's a wild concept, just work.
You know, like every other broadcaster, streaming platform, and company that does live content has been able to do.
Acting like this is a novel, hard problem that needs to be solved and we need to "upsell" it in tiers because Netflix is incompetent and live broadcasting hasn't been around for 80+ years is so fucking stupid.
I don't think that live doesn't work with caches. No one watching live would care about a O(s) delay, which is highly amenable to caching at ISPs and streaming changes from there to downstream clients. Offhand I'd say that would support O(ms) delay but no less.
That model still works for streaming. You have a central source stream only to the distributed edge locations, then have clients only stream from their local edge location. Even if one region is overwhelmed, the rest can still work. Load on the central source is bounded.
Likely these devices use different media formats and/or quality levels. And yes, it's possible one device buffers more than the other. Infinite freezes sounds like some routing issues or bugs.
When I was watching the behavior on the tv, was wondering if buffering sends some separate, non-business-as-usual requests, and that part of Netflix's delivery architecture was being overloaded.
E.g. "give me this previous chunk" vs "send me the current stream"
Buffering typically just consumes the same live stream until there's enough in the buffer. No difference other than the request rate being potentially higher. At least I can confidently say that for the standard players/video platforms. NetFlix could be doing something different. I'm not sure if they have their own protocols. But I'd be very surprised if the buffering used a completely different mechanism.
Damn that sucks. I wonder if they could have intentionally streamed it 5 min late? I don’t have all the context around the fight though — maybe a competing service would win if Netflix intentionally induced delay?
they were introducing 5 minute delays on some of the clients. I noticed my ipad was always live and the smart tv had a 5 minute delay but you could fast forward to live.
If Netflix still interviews on hacker rank puzzles I think this should be a wake up call. Interviewing on irrelevant logic puzzles is no match for systems engineering.
I did a round of netflix interviews, didn't get an offer (but passed the technical coding rounds) they absolutely had the best interview process of any company I've interviewed at my entire career.
They do make you code but the questions were
1. Not on hacker rank or leetcode
2. Pratical coding questions that didn't require anything more than basic hashmaps/lists/loops/recursion if you want. Some string parsing, etc.
They were still hard, you had to code a fast, but no tricky algorithms required. It also felt very collaborative, it felt like you were driving pair programming. Highly recommended even though didn't get an offer!
For systems design and engineering, absolutely this. I expected the very highest standards and upmost uptime from Netflix, similar to Google and Amazon.
Tells you the uselessness of their engineering blogs.
If places like Paramount+ can figure it out, Netflix, given their 10+ year head start on streaming and strong engineering culture, should also have been able to. And if you don't like my example, literally every other streaming service has streamed live sports without issue. YT TV, Hulu, Paramount+, Amazon Prime, Peacock, even Apple TV streams live sports.
It may be "new" to them, but they should have been ready.
I won’t argue that they shouldn’t have done better, I’m only pointing out that this is fairly different from their usual product. Amazon, YouTube, and Hulu all have a ton of experience with live streaming by now. Apple has live streamed wwdc for several years.
I did expect that Netflix would have appropriately accounted for demand and scale, though, especially given the hype for this particular event.
Has Netflix ever live streamed something before? People on reddit are reporting that if you back up the play marker by about 3 minutes the lag goes away. They've got a handle on streaming things when they have a day in advance to encode it into different formats and push it to regional CDNs. But I can't recall them ever live streaming something. Definitely nothing this hyped.
I don't spend much time streaming, but I got a glimpse of the Amazon Prime catalog yesterday, and was surprised at how many titles on the front page were movies I'd actually watch. Reminded me of Netflix a dozen years ago.
Amazon Prime isn't so great. Lots of for rent/purchase content or content with ads these days. And they end up repeating slots of content in all the rows in their UI, so I end up seeing the same suggestions everywhere rather than much that's new (other than first party productions).
To me they're basically padding their front page.
But honestly that's most of the major streaming platforms these days. I recently cancelled Disney Plus for similar reasons. The only reasons I don't cancel prime or Netflix are because I have family members I split the memberships with to share.
I recently found a lil dvd rental place in my city. It’s a non-profit, they also do archivals and stuff.
It’s pretty much a two-story townhouse packed head to toe with DVDs (lots of blu rays!)
You don’t realize how limited the streaming collection is until you’re back in a movie store, looking through thousands and thousands of movies you would never find otherwise.
Since I found it, I’ve started doing movie night every week with my friends. It’s such an absolute blast. We go to the store, each pick out a random movie that looks good (or bad, in a good way) or just different.
That's an excellent option. I think it'd be remiss not to mention local libraries. Of course, your mileage may vary, but the ones I've gone to do seem to have adequate selections. I just don't often make time to go there and browse like I would have at traditional video rental places back in the day.
Heck, mine even have some video games; though from when I've checked they're usually pretty back-reserved.
I was in high school in the early 00s, and going to the movies was such a big part of my life. Now, I never even know what's out.
I suspect life stage is a factor, but it does feel like there are many classes of entertainment (cinema and standup come to mind) that don't resonate like they used to.
Back in the day everyone was watching the same thing.
The choices for entertainment were limited to whatever was showing in movie theatres, whatever was on TV and whatever record stores were selling.
I've given Netflix a lot more money than I've gotten value out of. I've had an account for ~15y and only really use it for airplanes unless there's a specific thing I'm excited to watch.
I'm in the same boat where as soon as they make it too hard to share, I'll probably cancel it. I think the main reason their sharing crackdown hasn't been a problem so far is that I use it so seldomly, it thinks the "main" address is my parents, which makes it easy for me to pass the "are you traveling" 2FA on my own phone when I do want to watch something.
> And they end up repeating slots of content in all the rows in their UI, so I end up seeing the same suggestions everywhere rather than much that's new
All of the streaming services do this and I hate it. Netflix is the worst of the bunch, in my experience. I already scrolled past a movie, I don't want to watch it, don't show it to me six more times.
Imagine walking through a Blockbuster where every aisle was the same movies over and over again.
There's also the "FreeVee" items, which have ads regardless of whether you're a prime subscriber or not. And it feels like a lot of their catalog has been transferred over to FreeVee.
It's been pretty rough the last few years. So many great films and series, not to mention kids programming, removed to make way for mediocre NeTfLiX oRiGiNaLs and Bollywood trash.
Prime Video has to be the worst of all major streaming services. The video quality is horrible, its crippled with ads (3 not skippable ads for an episode of 45 minutes, lastly), and a lot of interesting titles are behind a "partner paywall".
I have prime and my shopping experience is crippled with ads too.
I think it got worse for sellers recently too. If I search for something, like a specific item using its description, sometimes the only result for it shows "sponsored".
It used to show up as sponsored and also unsponsored below.
If this changed, I assume it is bad for the seller. Either they pay for all search results, or their metrics are skewed because all searches were helped by "sponsorship" (and there are no longer unsponsored hits)
I was watching the rings of power and it started with a "Commercial free experience provided by so and so" with a long ad at the start of the episode, and then a third of the way into the episode, at a critical action part, it broke in the middle of the actor's sentence to a 6 minute ad block.
I exited playback and haven't gone back to finish it. I'll wait for it eventually to make it to a Blu-ray release someday.
> ut my impression still has always been that Netflix is a step above the rest technically.
I always assumed youtube was top dog for performance and stability. I can’t remember the last time I had issues with them and don’t they handle basically more traffic than any other video service?
Maybe a client issue, but i've got a low-end smart tv which handles netflix fine, but youtube is unwatchable due to buffering and failed cuts to adverts
I think Netflix will have even more sw engineers looking to work there once they notice even for average quality of work they can get paid 3 times more than their current pay.
Most people pay Netflix to watch movies and tv shows, not sports. If I hadn't checked Hacker News today, I wouldn't even know they streamed sports, let alone that they had issues with it. Even now that I do, it doesn't affect how I see their core offering, which is their library of on-demand content.
Netflix's infrastructure is clearly built for static content, not live events, so it's no shock they aren't as polished in this area. Streaming anything live over the internet is a tough technical challenge compared to traditional cable.
Is it really that big a deal if you are watching a few minutes behind?
I've watched ball games on streaming networks where I can also hear a local radio broadcast, and the stream is always delayed compared to the radio, sometimes by quite a lot. But you'd never know it if you were just watching the stream.
I remember a few years ago reading about a scam at the Australian Open Tennis where there were people inside the stadium who were betting on individual points as they happened.
I guess they could bet before the betting streams caught up.
It seems ridiculous to me that you can bet on individual points, but here we are.
The issue is that most people are trying to watch live which is what it's advertised as. And until they figure out that they need to watch X minutes behind, it is unwatchable. Many will not figure that out.
So for the first hour it was just total frustration until I stopped trying to go back to live mode.
Internet streams are not real-time even in the best case. There is always a few seconds of delay, often quite a bit more than that depending on number of hops and link speeds, congestion, etc.
I think why I will remember about this fight is not the (small) streaming issue I encountered as much as the poor quality of the fight itself. For me that was the reputational loss. Netflix was touting “NFL is coming to Netflix”. This fight did not really make me want to watch that.
I don't care about boxing or UFC or the grade-A douchebags that are the Paul brothers, but I tuned in just because I had the time and a Netflix subscription.
It was actually great that the fight itself was so boring because it justifies never having to spend time / money on that kind of bullshit. It was a farce. A very bright, loud, sparkly, and expensive (for some people) farce.
The value I got from it was the knowledge that missing out on that kind of thing isn't really missing out on anything at all.
Not really a joke, though? VOD has obvious methods to cheat a bit. Redundancy abounds and you can even network shape for costs. Could probably get even better compression for clear reasons.
Live, not so much. One source that you have to fanout from and absolutely no way to get cheap redundancy. Right?
I don't think it'll be long-term. Most people will forget about this really quickly. It's not like there will be many people saying "Oh, you don't want to sign up for Netflix, the Tyson fight wasn't well streamed" in even 6 months nevermind 10 years.
Most third-party internet-based streaming solutions are overlaid on top of a point-to-point network, while broadcast is one-to-many, and even cable tends to use multicast within the cable provider's network.
You have potentially different problems, e.g. limited bandwidth / spectrum. If, say, there are multiple games going on at the same time, you can only watch whichever feed the broadcaster decides to air. And, of course, regardless of the technology in use, there are matters of acquiring rights for various events. One benefit of internet-based streaming is that one service can acquire the rights and be able to reach everyone, whereas an individual cable provider might only reach its direct subscribers.
On cable(terrestrial is entirely different) even the bandwidth or spectrum is less limiting for broadcasting multiple games. Hard thing is the other parts of production, like cameras, live-directing and live commentary. Adding new channels is less challenging than actual producing content at expected level there.
Based on this I'm wondering whether it was straight up they did not expect it to be this popular?
> Some Cricket graphs of our #Netflix cache for the #PaulVsTyson fight. It has a 40 Gbps connection and it held steady almost 100% saturated the entire time.
I don't think Netflix is even designed to handle very extreme multi-region live-streaming at scale as evidenced in this event with hundreds of millions simultaneously watching.
YouTube, Twitch, Amazon Prime, Hulu, etc have all demonstrated to stream simultaneously to hundreds of millions live without any issues. This was Netflix's chance to do this and they have largely failed at this.
There are no excuses or juniors to blame this time. Quite the inexperience from the 'senior' engineers at Netflix not being able to handle the scale of live-streaming which they may lose contracts for this given the downtime across the world over this high impact event.
Very embarrassing for a multi-billion dollar publicly traded company.
The assumption that it was related to insufficient investment isn’t supported by any evidence. Flawed technical decisions can be made by the most expensive engineers too.
Other potential and future entertainment partners Netflix will be working with e.g. WWE, will certainly see my view as they will be questioning Netflix's capability after that major streaming issue we both saw.
This isn't Netflix's first time they had this live streaming problem.
People will see this as an underinvestment from Netflix's part and they will reconsider going to a different streaming partner.
Yea, it’s a bad look. But I switched to watching some other Netflix video and it seemed fine. Just this event had some early issues. Looks fine now though.
Streamed glitch free for me both on my phone and Xbox. The fight wasn’t so great though, but still a fun event. Jake Paul is a money machine right now.
Static files have been pretty much the standard streaming protocols for both VOD and live for the last 15 years ago. Before, it was Adobe Flash (RTMP).
With the way that they are designed, you can even use a regular CDN.
You can push these files to all the edges before you release the content which will protect your origin. Livestream all your edge servers are grabbing content from the origin unless you have another tier of regional servers to alleviate load.
Sure but that’s why your edge servers do request collapsing. And there are full blown CDN companies that will write an enterprise contract with you that can do this stuff with ease. Akamai is like 25 years old now.
Scale has increased but the techniques were figured out 20 years ago. There is not much left to invent in this space at the current moment so screwing up more than once is a bit unacceptable.
Two games actually, both on Christmas Day. A day when most people are at home or the home of family or friends, and they are both pretty good late-season matchups (Chiefs-Steelers and Ravens-Texans) so I imagine viewership will be high.
If they botch the NFL games, it will surely hurt their reputation.
Yeah, the funny part is that Hulu, Amazon Prime, and Peacock have all demonstrated the ability to handle an event of this caliber with no issue. Netflix now may never get another opportunity like this again.
To me the difference is that in 2012, you had companies focusing on delivering a quality product, whether it made money or not. Today, the economic environment has shifted a lot and companies are trying to increase profits while cutting costs. The result is inevitably a decline in quality. I'm sure that Netflix could deliver a flawless live stream to millions of viewers, but the question is can they do it while making a profit that Wall Street is happy with. Apparently not.
The funny thing is I was just reading something on HN like three days ago about how light years ahead Netflix tech was compared to other streaming providers. This is the first thing I thought of when I saw the reports that the fight was messing up.
But is there a way that Netflix might have learned from all of Youtube's past mistakes?
The only reasonable way to scale something like this up is probably to... scale it up.
Sure, there are probably some generic lessons, but I bet that the pain points in Netflix's architecture (historically grown over more than a decade and optimized towards highly cacheable content) are very different from Youtube, which has ramped up live content gradually over as many years.
The average quality of talent has gone way down compared to 2012 though.
E.g. the median engineer, excluding entry level/interns, at YouTube in 2012 was a literal genius at their niche or quite close to it.
Netflix simply can’t hire literal geniuses with mid six figure compensation packages in 2024 dollars anymore… though that may change with a more severe contraction.
It's incomprehensible to me that Netflix, one of the most highly skilled engineering teams in the world - completely sh*t the bed last night and provided a nearly unwatchable experience that was not even in the same league as pre-internet live broadcast from 30 years ago.
My bet is that a technical manager told his executive (multiple times) that he needed more resources and engineering time to make live work properly, and they just told him to make do because they didn't want to spend the money.
It could come down to something as stupid as:
Executive: "we handled [on demand show ABCD] on day one, that was XX million"
Engineering: "live is really different"
Executive: (arguing about why it shouldn't be that different and should not need a lot of new infrastructure)
Engineering: (can't really argue with his boss about this anymore after having repeated the same conversation 3 or 4 times)
-- tells the team: we are not getting new servers or time for a new project. We have to just make do with what we have. You guys are brilliant, I know you can do it!"
hopefully that technical manager has a paper trail and that executive has someone to answer to above them. in cases like this, i always throw together a doc and ask for sign offs.
Caring about things and tangible consequences are two different things. I don't necessarily agree with the GP comment but one should expect a higher quality retort.
I had buffering issues but then backed off and let a bit of it buffer up (maybe 1 or 2 mintues?) and then it was fine for the entire Tyson Paul match. There was no reason I needed it to be live vs. a 1 or 2 minute delay.
If you had money riding on it because of legalized sports gambling in your state and sports regulators clearing it to be a thing to bet on, then it was probably pretty psychologically important whether or not there was a 1 or 2 minute delay on you seeing the results of the game.
This topic is really just fun for me to read based on where I work and my role.
Live is a lot harder than on demand especially when you can't estimate demand (which I'm sure this was hard to do). People are definitely not understanding that. Then there is that Netflix is well regarded for their engineering not quite to the point of snobbery.
What is actually interesting to me is that they went for an event like this which is very hard to predict as one of their first major forays into live, instead of something that's a lot easier to predict like a baseball game / NFL game.
I have to wonder if part of the NFL allowing Netflix to do the Christmas games was them proving out they could handle live streams at least a month before. The NFL seems to be quite particular (in a good way) about the quality of the delivery of their content so I wouldn't put it past them.
Netflix’s snobbery of engineering is so exhausting. Then seeing them be unable to fix this problem after several previous streaming failures is a bit rich.
To me it speaks to how most of the top tech companies of the 2010s have degraded as of late. I see it all the time with Google hiring some of the lower performing engineers on my teams because they crushed Leetcode.
> The NFL seems to be quite particular (in a good way) about the quality of the delivery of their content
Alas, my experience with the NFL in the UK does not reflect that. DAZN have the rights to stream NFL games here, and there are aspects of their service that are very poor. My major, long-standing issue has been the editing of their full game “ad-free” replays - it is common for chunks of play to be cut out, including touchdowns and field goals. Repeated complaints to DAZN haven’t resulted in any improvements. I can’t help but think that if the NFL was serious about the quality of their offering, they’d be knocking heads together at DAZN to fix this.
I don't think they think this is a problem actually. Content edited replays are actually very popular with sports fans who are time shifting. Time shifting is also an afterthought for the NFL / MLB / NHL from what I can tell. I live in Seattle but grew up in the midwest so time shift a ton of sports and it's always been horrific.
I'm more comparing Thursday Night Football and the quality of the encoding than anything. Delivery glitches are a seperate issue that I think they care about less.
NFL: 90+ minutes after the match on NFL Gameday, and it auto plays the most recent video for that team, which is always the post game interview. So you load it up, go to your team and it auto plays the "we won" or "it was a tough loss" like why the f*ck am I paying for a dvr solution when you do that. NFL Sunday Ticket: you can watch the games sometime monday after the fact but not the sunday night games. Good thing I paid well below half price for it with a disciount.
NHL: constantly shifting networks each year with worse solutions and not letting you get to half the previous games after a week. Totally useless for deferred unless you only want to watch the game a day or more after. Fubo, you have to 'record' the game and sometimes it's on a slightly different network and doesn't record. And their blackout system is the worst of all, who cares about your mediocre local team sorry you can't watch Chefs/Bills because they overlapped by some amount.
MLB: always broken at the top of the year, constantly changing the interface. You often get stuck watching the commercial break which is not actually commercials and is just the same "ohtani / judge highlight video from 2 years ago" and a "stat" about the sluggers that is almost an entire season out of date. The app resets when switching from the live CDN to the on demand one once the game ends which often resets the game and jumps you 6 innings forward, or makes the game unavialable for 30 minutes.
And EFF you if you want to watch playoffs on any of them.
Aside from latency (which isn't much of a problem unless you are competing with TV or some other distribution system), it seems easier than on-demand, since you send the same data to everyone and don't need to handle having a potentially huge library in all datacenters (you have to distribute the data, but that's just like having an extra few users per server).
My guess is that the problem was simply that the number of people viewing Netflix at once in the US was much larger than usual and higher than what they could scale too, or alternatively a software bug was triggered.
On demand is easier precisely because having a huge library in all data centers is relatively cheap. In actuality you just have a cache, collocated ISPs that pulls from your origin servers. Likely you have users all watching different things so you can easily avoid hot spots by sharding on the content type. Once the in demand content is in the cache its' relatively easy to serve.
Live content is harder because it can't really be cached, nor, due to TLS, can you really serve everyone the same stream. I think the hardest problem to solve is provisioning. If you are expecting 1 million users, and 700,000 of them get routed to a single server, that server will begin to struggle. This can happen in a couple different ways - for example an ISP who isn't a large consumer normally, suddenly overloads its edge server. Even though your DC can handle the traffic just fine, the links between your DC and the ISP begin to suffer, and since the even is live, it's not like you can just wait until the cache is filled downstream.
isn't it a tree of cache servers? as origin sends the frames they're cached.
and as load grows the tree has to grow too, and when it cannot resorting to degrading bitrate, and ultimately to load shedding to keep the viewers happy?
and it seems Netflix opted to forego the last one to avoid a the bad PR of an error message of "we are over capacity" and instead went with actually let it burn, no?
When I mean "cached", it means that the PoP server can serve content without contacting the origin server. (The PoP can't serve content it does not have).
>and it seems Netflix opted to forego the last one to avoid a the bad PR of an error message of "we are over capacity" and instead went with actually let it burn, no?
Anything other than 100% uptime is bad PR for Netflix.
Latency is somewhat important for huge sporting events; you don't want every tense moment spoiled by the cheers of your neighbours whose feed is 20 seconds ahead.
With on-demand you can push the episodes out through your entire CDN at your leisure. It doesn't matter if some bottleneck means it takes 2 hours to distribute a 1 hour show worldwide, if you're distributing it the day before. And if you want to test, or find something that needs fixing? You've got plenty of time.
And on-demand viewers can trickle in gradually - so if clients have to contact your DRM servers for a new key every 15 minutes, they won't all be doing it at the same moment.
And if you did have a brief hiccup with your DRM servers - could you rely on the code quality of abandonware Smart TV clients to save you?
That has been a big problem for football, especially things like the Super Bowl.
People using over the air antennas get it “live“. Getting it from cable or a streaming service meant anywhere between a few seconds and over a minute of delay.
It was absolutely common to have a friend text you about something that just happened when you haven’t even seen it yet.
You can’t even say that $some_service is fast, some of them vary over 60 seconds just between their own users.
Latency between the live TV signal for my neighbours and the BBC iPlayer app I was using to watch the Euro 2024 final literally ruined the main moments for me. It still remains an unsolved issue long into the advent of live streaming.
I'm not an expert in this, but at least familiar with the trade.
I'd imagine with on-demand services you already have the full content and therefore can use algorithms to compress frames and perform all kinds of neat tricks to.
With live streaming I'd imagine a lot of these algorithms are useless as there isn't enough delay & time to properly use them, so they're required to stream every single pixel and maybe some JIT algorithms
People are always impressed that Netflix can stand up to a new episode of Squid Game being released. And it’s not easy, we’ve seen HBO failed to handle Game of Thrones for example.
But in either case, you can put that stuff on your CDN days ahead of time. You can choose to preload it in the cache because you know a bunch of people are gonna want it. You also know that not every single individual is going to start at the exact same time.
For live, every single person wants every single bite at the same time and you can’t preload anything. Brutal.
I'm in an adjacent space, so I can imagine some of the difficulties. Basically live streaming is a parallel infrastructure that shares very little with pre-recorded streaming, and there are many failure points.
* Encoding - low latency encoders are quite different than storage encoders. There is a tradeoff to be made in terms of the frequency of key frames vs. overall encoding efficiency. More key frames means that anyone can tune in or recover from a loss more quickly, but it is much less efficient, reducing quality. The encoder and infrastructure should emit transport streams, which are also less efficient but more reliable than container formats like mp4.
* Adaptation - Netflix normally encodes their content as a ladder of various codecs and bitrates. This ensures that people get roughly the maximum quality that their bandwidth will allow without buffering. For a live event, you need the same ladder, and the clients need to switch between rungs invisibly.
* Buffering - for static content, you can easily buffer 30 seconds to a minute of video. This means that small latency or packet loss spikes are handled invisibly at the transport/buffering layer. You can't do this for a live event, since that level of delay would usually be unacceptable for a sporting event. You may only be able to buffer 5-10 seconds. If the stream starts to falter, the client has only a few seconds to detect and shift to a lower rung.
* Transport - Prerecorded media can use a reliable transport like TCP (usually HLS). In contrast, live video would ideally use an unreliable transport like UDP, but with FEC (forward error correction). TCP's reaction to packet loss halves the receive window, which halves bandwidth, which would have to trash the connection to shift to a lower bandwidth rung.
* Serving - pre-recorded media can be synchronized to global DCs. Live events have to be streamed reliably and redundantly to a tree of servers. Those servers need to be load balanced, and the clients must implement exponential backoff or you can have cascading failures.
* Timing - Unlike pre-recorded media, any client that has a slightly fast clock will run out of frames and either need to repeat frames and stretch audio, or suffer glitches. If you resolve this on the server side by stretching the media, you will add complication and your stream will slowly get behind the live event.
* DVR - If you allow the users to pause, rewind, catch up, etc., you now have a parallel pre-recorded infrastructure and the client needs to transition between the two.
* DRM - I have no idea how/if this works on a live stream. It would not be ideal that all clients use the same decryption keys and have the same streams with the same metadata. That would make tracing the source of a pirate stream very difficult. Differentiation/watermarking adds substantial complexity, however.
I know nothing about boxing and this fight was just ridiculously impressive. I kept tuning out of the earlier fights. They felt like some sort of filler. I didn’t get the allure. But Taylor v Serrano was just obvious talent that even I could appreciate it.
Well said. People tuning in late missed out on the real main event. I have a feeling this Tyson fight will be a waste of staying up late and battling Netflix over.
Reading the comments here, I think one thing that's overlooked is that Netflix, which has been on the vanguard of web-tech and has solved many complicated problems in-house, may not have had the culture to internally admit that they needed outside help to tackle this problem.
What do you think were the dynamics of the engineering team working on this?
I'd think this isn't too crazy to stress test. If you have 300 million users signed up then you're stress test should be 300 million simultaneous streams in HD for 4 hours. I just don't see how Netflix screws this up.
Maybe it was a management incompetence thing? Manager says something like "We only need to support 20 million simultaneous streams" and engineers implement to that spec even if the 20 million number is wildly incorrect.
There's no way 300 million people watched this, especially if that number is representing every Netflix subscriber. The largest claimed live broadcast across all platforms is last year's Super Bowl with 202 million unique viewers for at least part of it, but that includes CBS, Nickelodeon, and Univision, not just streaming audiences. Its average viewers for the whole game was 123 million, which is second all-time to the Apollo 11 moon landing.
FIFA claimed the 2022 World Cup final reached 1.5 billion people worldwide, but again that seems like it was mostly via broadcast television and cable.
As far as single stream, Disney's Hotstar claimed 59 million for last year's Cricket World Cup, and as far as the YT platform, the Chandrayaan-3 lunar landing hit 8 million.
100 million is a lot of streams, let alone 300. But also note that not every stream reaches a single individual.
And, as far as the 59 million concurrent streams in India, the bitrate was probably very low (I'd wager no more than 720p on average, possibly even 480p in many cases). It's again a very different problem across the board due to regional differences (such as spread of devices, quality of network, even behavioral differences).
I mean, yes, but nobody streams RAW video in practice, and I can't imagine any users or service providers who'd be happy with that level of inefficiency. In general, it's safe to assume some reasonable compression (which, yes, is likely lossy).
Not through a single system, the advantage of diversity rather than winner-takes-all.
The world cup final itself (and other major events) is distributed from the host broadcaster to either on site at the IBC or at major exchange points.
When I've done major events of that magnitude there's usually a backup scanner and even a tertiary backup. Obviously feeds get sent via all manner - the international feed for example may be handed off at an exchange point, but the reserve is likely available on satelite for people to downlink on. If the scanner goes (fire etc), then at least some camera/sound feeds can be switched direct to these points, on some occasions there's a full backup scanner too.
Short of events that take out the venue itself, I can't think of a plausible scenario which would cause the generation or distribution of the broadcast to break on a global basis.
I don't work for OBS/HBS/etc but I can't imagine they are any worse than other broadcast professionals.
The IT part of this stuff is pretty trivial nowadays, even the complex parts like the 2110 networks in the scanner tend to be commoditised and treated as you'd treat any other single system.
The most technically challenging part is unicast streaming to millions of people at low latency (DASH etc). I wouldn't expect an enormous architectural difference between a system that can broadcast to 10 million or 100 million though.
To be fair, a lot of people pay their ISP for a modem/router combo and connect to something like "Xfinity" at their house. So to them, there is no difference.
Live streaming is hard. Most companies that do live streaming at 2024 scale did it by learning from their mistakes. This is true for Hotstar, Amazon and even Youtube. Netflix stack is made to stream optimised, compressed , cached videos with a manageable concurrent viewers for the same video. Here we had ~65m concurrent viewers in their first live event. The compression they use, distribution etc have not scaled up well. I'll judge them based on how they handle their next live event
I don't think this is their first live event. They have hosted a pro golf promotional match and they had a live pro tennis match between Nadal and Alcaraz off the top of my head.
When you step back and look at the situation, it's not hard to see why Netflix dropped the ball here. Here's now I see it (not affiliated with Netflix, pure speculation):
- Months ago, the "higher ups" at Netflix struck a deal to stream the fight on Netflix. The exec that signed the deal was probably over the moon because it would get Netflix into a brand new space and bring in large audience numbers. Along the way the individuals were probably told that Netflix doesn't do livestreaming but they ignored it and assumed their talented Engineers could pull it off.
- Once the deal was signed then it became the Engineer's problem. They now had to figure out how to shift their infrastructure to a whole new set of assumptions around live events that you don't really have to think about when streaming static content.
- Engineering probably did their absolute best to pull this off but they had two main disadvantages, first off they don't have any of the institutional knowledge about live streaming and they don't really know how to predict demand for something like this. In the end they probably beefed up livestreaming as much as they could but still didn't go far enough because again, no one there really knows how something like this will pan out.
- Evening started off fine but crap hit the fan later in the show as more people tuned in for the main card. Engineering probably did their best to mitigate this but again, since they don't have the institutional knowledge of live events, they were shooting in the dark hoping their fixes would stick.
Yes Netflix as a whole screwed this one up but I'm tempted to give them more grace than usual here. First off the deal that they struck was probably one they couldn't ignore and as for Engineering, I think those guys did the freaking best they could given their situation and lack of institutional knowledge. This is just a classic case of biting off more than one can chew, even if you're an SV heavyweight.
This isn't Netflix's first foray into livestreaming. They tried a livestream last year for a reunion episode of one of their reality TV shows which encountered similar issues [0]. Netflix already has a contract to livestream a football event on Christmas, so it'll be interesting to see if their engineers are able to get anything done in a little over a month.
These failures reflect very poorly on Netflix leadership. But we all know that leadership is never held accountable for their failures. Whoever is responsible for this should at least come forward and put out an apology while owning up to their mistakes.
> But we all know that leadership is never held accountable for their failures.
You've never heard of a CEO or other C-suite or VP getting fired?
It most definitely happens. On the other hand, people at every level make mistakes, and it's preferable that they learn from them rather than be fired, if at all possible.
Accountability can take many forms. I don't think they should be fired for making a mistake, I think they should release a statement recognizing their failure along with a post-mortem. Not a particularly high bar, but most leadership failures are often swept under the rug without any public accountability or evidence that they've learned anything.
We have evidence of prior failures with livestreaming from Netflix. Were the same people responsible for that failure or do we have evidence of them having learned anything between events? If anything, I'd expect the best leaders would have a track record that includes failures while showcasing their ability to overcome and learn from those mistakes. But based on what information is publicly available, this doesn't seem to be the case in this situation.
> "GPGPU compute is a solved problem if you buy Nvidia hardware" type comment
You're replacing the word hire with buy. That misconstrues the comment. If you need to do GPGPU compute and have never done it, you work with a team that has. (And if you want to build it in house, you scale to it.)
>"GPGPU compute is a solved problem if you buy Nvidia hardware"
Which is valid? If your problem can be solved by writing a check, then it's the easiest problem to have on the planet.
Netflix didn't have to put out 3 PhD dissertations on how to improve the SOTA of live streaming, they only needed to reliably broadcast a fight for a couple hours.
That is a solved problem.
Amazon and Cloudflare do that for you as a service(!). Twitch and YouTube do it literally every day. Even X started doing it recently so.
Landing on Mars is a solved problem. Nuclear bombs are a solved problem. Doesn't mean anyone can just write a check and get it done and definitely doesn't mean any business model can bear that cost.
It should be obvious that not all risks can be converted into a capital problem.
People say this, but then fall in love, get divorced, get depressed, or their company might lose its mojo, get sued, or lose an unreplaceable employee. But they will still say “all risk can be costed.”
If it is just a matter of paying up, why hasn't ESA pulled it off? I'm pointing out and offering examples that "solved problem" has no bearing on the ease or organization capacity of any one group to do it. It is merely a statement that no unknown, new solution needs be invented.
If I have to spell it out you're clearly debating in bad faith and we're done here.
It is. The fact that 'streaming is a solved problem' has no bearing on any one company's ability to do it at scale. Solved problem means merely you don't have to invent something new, not that it is easy or within reach of everyone.
"Solved" merely means you don't need to invent something new to solve it. It doesn't mean trivial nor easy. And it definitely doesn't mean the problem is above trade-offs.
Look. I'm a small startup employee. I have a teeny tiny perspective here. But frankly speaking the idea that Netflix could just take some off the shelf widget and stuff it in their network to solve a problem... It's an absurd statement for even me. And if there's anyone it should apply to it would be a little startup company that needs to focus on their core area.
Every off the shelf component on the market needs institutional knowledge to implement, operate, and maintain it. Even Apple's "it just works" mantra is pretty laughable in the cold light of day. Very rarely in my experience do you ever get to just benefit from someone else's hard work in production without having an idea how properly implement, operate, and maintain it.
And that's at my little tiny ant scale. To call the problem of streaming "solved" for Netflix... Given the guess of the context from the GP post?
I just don't think this perspective is realistic at all.
> the idea that Netflix could just take some off the shelf widget and stuff it in their network to solve a problem
Right. They have to hire one of the companies that does this. Each of YouTube, Twitch (Amazon), Facebook and TikTok have, I believe, handled 10+ million streams. The last two don't compete with Netflix.
I believe this is the spirit of the "solved problem" comment: not that the solution is an off-the-shelf widget, but that if it has ever been solved, then that solution could technically be used again, even if organizing the right people is exorbitantly expensive.
There are multiple companies that offer this capability today that would take a few weeks to hide behind company branding. This was a problem of netflix just not being set up for live stream but thinking they could handle it.
My speculation here is this was just classic SV cockiness. The team that closed this deal probably knew that they didn't have the capability but I'm sure the arguments for doing it anyways was something along the lines of: "we have the best engineers in the bay area, we can probably figure this out"
There are endless amounts of stories and situations in which selling something before it really exists has helped businesses. It's totally plausible that a team working on video streaming at the scale of Netflix could figure out live streaming.
Pre-optimization is definitely a thing and it can massively hurt (i.e. startups go under) businesses. Let's stop pretending any businesses would say 'no' to extra revenue even before the engineering team had full assurance there was no latency drop.
In my language we have a saying that roughly translates to "Don't sell the hide until you've shot the bear".
And sure, there have probably been lots of examples where a business made promises they weren't confident about and succeeded. But there are surely also lots of examples where they didn't succeed.
So what's the moral of the story? I don't know, maybe if you take a gamble you should be prepared to lose that gamble. Sounds to me like Netflix fucked up. They sold something they couldn't provide. What are the consequences of that? I don't know, not do I particularly care. Not my problem.
Do startups really do this? I thought the capability is built or nearly built or at least in testing already with reasonable or amazing results, THEN they go to market?
Do startups go to other startups, fortune 500 companies and public companies to make false promises with or without due diligence and sign deals with the knowledge that the team and engineers know the product doesn't have the feature in place at all?
In other words:
Company A: "We can provide web scale live streaming service around the world to 10 billion humans across the planet, even the bots will be watching."
Company B: "OK, sounds good, Yes, here is a $2B contract."
Company A: "Now team I know we don't have the capability, but how do we build, test and ship this in under 6 months???"
Startups absolutely sell things they haven't made yet and might not even be capable of doing.
Next thing you know it's 9pm on a Sunday night and your desperately trying to ship a build for a client.
Netflix isn't some scrappy company though. If I had to guess they threw money at the problem.
A much better approach would of been to slowly scale over the course of a year. Maybe stream some college basketball games first, slowly picking more popular events to get some real prod experience.
Instead this is like their 3rd or 4th live stream ever. Even a pre show a week before would of allowed for greater testing.
I'm not a CTO of a billion dollar company though. I'm just an IC who's seen a few sites go down underload.
To be fair no one knows how it's going to go before it happens. It would of been more surprising for them to pull this off without issues... It's a matter of managing those issues. I know if I had paid 30$ for a Netflix subscription to watch this specific event I'd assume I got ripped off.
You don't necessarily have to make false promises.
You can be totally honest and upfront that the functionality doesn't exist yet and needs to be built first, but that you think you understand the problem space and can handle the engineering, provided you can secure the necessary funding, where, by the way, getting a contract and some nominal revenue now could greatly help make this a reality...
And if the upside sounds convincing enough, a potential customer might happily sign up to cover part of your costs so they can be beta testers and observe and influence ongoing development.
Of course it happens all the time that the problem space turns out to be more difficult than expected, in which case they might terminate the partnership early and then the whole thing collapses from lack of funding.
If anything, startups are more transparent about it.
In the enterprise sector this is rampant. Companies sell "platforms" and those missing features are supposed to be implemented by consultants after the sale. This means the buyer is the one footing the bill for the time spent, and suffering with the delays.
Many do, as far as initial investment goes. It makes sense when you think about the capital intensive nature of most startups (including more than web startups here, e.g. lab tech commercialization). It also accurately describes a research grant.
That's for startups that can't bootstrap (most of them). For ones which can, they may still choose to do this with customers, as you describe, because it means letting their work follow the money.
They arent clowns at all.
Ita a totally different engineering problem and you cant just spin up live streaming capacity on demand. The entire system end to end isnt optimized for live streams yet.
That’s uncharitable. Proposing reasons for institutional failure and discussing those can be ways for humans to improve communication and said challenges.
It's a way to mislead people into misunderstanding reality and therefore solving the wrong problems, often causing harm, now and in the future.
That's why serious analysis requires a factual basis, such as science, law, and good engineering and management. You need analytics data to figure out where the performance and organizational bottlenecks are.
Before people tried to understand illness with a factual basis, they wrote speculative essays on leeching and finding 'better' ways to do it.
Main event hasn’t even started yet. Traffic will probably 10x for that. They’re screwed. Should have picked something lower profile to get started with live streaming.
It’s insane the excuses being made here for Netflix’s apparently unique circumstances.
They failed. Full stop. There is no valid technical reason they couldn’t have had a smooth experience. There are numerous people with experience building these systems they could have hired and listened to. It isn’t a novel problem.
Here are the other companies that are peers that livestream just fine, ignoring traditional broadcasters:
- Google (YouTube live), millions of concurrent viewers
- Amazon (Thursday Night Football, Twitch), millions of concurrent viewers
- Apple (MLS)
NBC live streamed the Olympics in the US for tens of millions.
I don't disagree that Netflix could have / should have done better. But everybody screws these things up. Even broadcast TV screws these things up.
Live events are difficult.
I'll also add on, that the other things you've listed are generally multiple simultaneous events; when 100M people are watching the same thing at the same time, they all need a lot more bitrate at the same time when there's a smoke effect as Tyson is walking into the ring; so it gets mushy for everyone. IMHO, someone on the event production staff should have an eye for what effects won't compress well and try to steer away from those, but that might not be realistic.
I did get an audio dropout at that point that didn't self correct, which is definitely a should have done better.
I also had a couple of frames of block color content here and there in the penultimate bout. I've seen this kind of stuff on lots of hockey broadcasts (streams or ota), and I wish it wouldn't happen... I didn't notice anything like that in the main event though.
Experience would likely be worse if there were significant bandwidth constraints between Netflix and your player, of course. I'd love to see a report from Netflix about what they noticed / what they did to try to avoid those, but there's a lot outside Netflix's control there.
As a cofounder of a CDN company that pushed a lot of traffic, the problem with live streaming is that you need to propagate peak viewership trough a loooot of different providers. The peering/connectivity deals are usually not structured for peak capacity that is many times over the normal 95th percentile. You can provision more connectivity, but you don't know how many will want to see the event.
Also, live events can be trickier than stored files, because you can't offload to the edges beforehand to warm up the caches.
So Netflix had 2 factors outside of their control
- unknown viewership
- unknown peak capacities outside their own networks
Both are solvable, but if you serve "saved" content you optimize for different use case than live streaming.
Disney HotStar managed to stream ~60M livestreams for the Cricket world cup a year ago. The problem has been solved. Livestreaming sports just have a different QoS expectations than on demand.
I wouldn't say it's a solved problem, how many other companies are pulling off those numbers? Isn't that the current record for concurrent streams? And wasn't it mostly to mobile devices?
The size of engineering head count is not informative, it really depends on how much is in-house and how much is external for Hotstar that would be i.e parent Disney or before Fox or staffing from IT consulting organizations who will not be on payroll.
For what it is worth, all things being equal there would be lot more non engineering in Hotstar for 2000 employees versus a streaming company of similar size or scale of users. Hotstar operates in challenging and fragmented market, India has 10+ major languages(and corresponding TV, music and movie markets) Technically there is not much difference to what Netflix or Disney has to do for i18n, however operationally each market needs separate sales, distribution and operations.
---
P.S. Yes Netflix operates in more markets including India than anybody else, however if you are actually using Netflix for almost any non English content, you will know how weak their library and depth in other markets are, their usual model in most of these markets is to have few big high quality(for that market) content rather than build depth.
P.P.S. Also yes, Indian market is seeing consolidation in the sense that many releases on streaming are multiple lingual and use major stars from more than one language to draw talent ( not new, but growing in popularity as distribution becomes cheaper with streaming), however this is only seen in big banner productions as tastes are quite different in each market and can't scale for all run of the mill content.
Amazon had their fair share of livestream failures and for notably less viewers. I don't think they deserve a spot on that list. I briefly worked in streaming media for sports and while it's not a novel problem, there are so many moving parts and points of failure that it can easily all go badly.
There is no one "Amazon" here, there are at least 3:
* Twitch: Essentially invented live streaming. Fantastic.
* Amazon Interactive Video Service [0]: Essentially "Twitch As A Service", built by Twitch engineers. Fantastic.
* Prime Video. Same exact situation as Netflix: original expertise is all in static content. Lots of growing pains with live video and poor reports. But they've figured it out: now there are regular live streams (NHL and NFL), and other channel providers do live streaming on Prime Video as a distribution platform.
Doesn't twitch almost fall over (other non-massive streams impacted) when anyone gets close to 4-5m concurrent viewers? I remember last time it happened everything started falling over, even for smaller streams. Even if Netflix struggled with the event, streaming other content worked just fine for me.
It's not full stop. There are reasons why they failed, and for many it's useful and entertaining to dissect them. This is not "making excuses" and does not get in the way of you, apparently, prioritizing making a moral judgment.
It will never not annoy and amuse me that illegal options (presumably run by randoms in their spare time) are so much better than the offerings of big companies and their tech ‘talent’.
Illegal options also have lot less resources (revenue, service providers who are willing host/facilitate illegal activities, and so on) so it’s a fair comparison in my opinion.
> service providers who are willing host/facilitate illegal activities
At least for NFL pirate streams, it seems they tend to use "burner" tenants from Azure and AWS. Of course they get shut down, but how hard is it to spin up another one?
I have Netflix purchased legally with hard earned money. But because I had issues I looked for illegal streams, and they were bad, crashes, buffering.. you name it. So I went back to Netflix and watched it at 140p quality.
This twitter stream was the most reliable for me. Completely took Netflix out of the equation; just some dude at the event with his phone: https://x.com/i/broadcasts/1mrxmMRmXpQxy
> But the real indicator of how much Sunday’s screw-up ends up hurting Netflix will be the success or failure of its next live program—and the next one, and the one after that, and so on. There’s no longer any room for error. Because, like the newly minted spouses of Love Is Blind, a streaming service can never stop working to justify its subscribers’ love. Now, Netflix has a lot of broken trust to rebuild.
Weird that an organization like Netflix is having problems with this considering their depth of both experience and pockets. I wonder if they didn't expect the number of people who were interested in finding out what the pay-per-view experience is like without spending any extra money. Still, I suppose we can all be thankful Netflix is getting to cut their live event teeth on "alleged rapist vs convicted rapist" instead of something more important.
From my experience, it works if your not watching it 'live'. But the moment I put my devices to 'live' it perma-breaks. 504 gateway timed out in web developer tools hitting my local CDN. probably works on some CDNs, doesnt on others. Probably works if your not 'live'
edit: literally a nginx gateway timed out screen if you view the response from the cdn... wow
This is probably a naive question but very relevant to what we have here.
In a protocol where a oft-repeated request goes through multiple intermediaries, usually every intermediate will be able to cache the response for common queries (Eg: DNS).
In theory, ISPs would be able to do the same with the HTTP. Although I am not aware of anyone doing such (since it will rightfully raise concerns of privacy and tampering).
Now TLS (or other encryption) will break this abstraction. Every user, even if they request a live stream, receives a differently encrypted response.
But live stream of a popular boxing match has nothing to do with the "confidentiality" of encryption protocol, only integrity.
Do we have a protocol which allows downstream intermediates eg ISPs to cache content of the stream based on demand, while a digital signature / other attestation being still cryptographically verified by the client?
there's Named Data Networking (went by Content-Centric Networking earlier). You request data, not a url, the pipe/path becomes the CDN. If any of your nearest routers have the bytes, your request will go no further.
I don't see it much mentioned the last few years, but the research groups have ongoing publications. There's an old 2006 Van Jacobson video that is a nice intro.
I've been re-watching Silicon Valley the last few weeks and just watched the Nucleus live stream episode 2 days ago, pretty funny seeing it in real life.
I guarantee this is a management issue. Somebody needed to bear down at some point and put the resources into load testing. The engineers told them it probably won't be sufficient.
I assume this came down to some technical manager saying they didn't have the human and server resources for the project to work smoothly and a VP or something saying "well, just do the best you can.. surely it will be at least a little better than last time we tried something live, right?"
I think there should be a $20 million class action lawsuit, which should be settled as automatic refunds for everyone who streamed the fight. And two executives should get fired.
At least.. that's how it would be if there was any justice in the world. But we now know there isn't -- as evidenced by the fact that Jake Paul's head is still firmly attached to his body.
I am curious about their live streaming infrastructure.
I have done live streaming for around 100k concurrent users. I didn't setup infrastructure because it was CloudFront CDN.
Why it is hard for Netflix. They have already figured out CDN part. So it should not be a problem even if it is 1M or 100M. because their CDN infrastructure is already handling the load.
I have only work with HLS live streaming where playlist is constantly changing compared to VOD. Live video chunks work same as VOD. CloudFront also has a feature request collapsing that greatly help live streaming.
So, my question is if Netflix has already figured out CDN, why their live infrastructure failing?
Note: I am not saying my 100k is same scaling as their 100M. I am curious about which part is the bottleneck.
> Why it is hard for Netflix. They have already figured out CDN part. So it should not be a problem even if it is 1M or 100M. because their CDN infrastructure is already handling the load ... Note: I am not saying my 100k is same scaling as their 100M. I am curious about which part is the bottleneck.
100k concurrents is a completely different game compared to 10 million or 100 million. 100k concurrents might translate to 200Gbps globally for 1080p, whereas for that same quality, you might be talking 20T for 10 million streams. 100k concurrents is also a size such that you could theoretically handle it on a small single-digit number of servers, if not for latency.
> CloudFront also has a feature request collapsing that greatly help live streaming.
I don't know how much request coalescing Netflix does in practice (or how good their implementation is). They haven't needed it historically, since for SVOD, they could rely on cache preplacement off-peak. But for live, you essentially need a pull-through cache for the sake of origin offload. If you're not careful, your origin can be quickly overwhelmed. Or your backbone if you've historically relied too heavily on your caches' effectiveness, or likewise your peering for that same reason.
200Gbps is a small enough volume that you don't really need to provision for that explicitly; 20Tbps or 200Tbps may need months if not years of lead time to land the physical hardware augments, sign additional contracts for space and power, work with partners, etc.
The first round of the fight just finished, and the issues seem to be resolved, hopefully for good. All this to say what others have noted already, this experience does not evoke a lot of confidence in Netflix's live-streaming infrastructure.
Yes, it was utterly boring, but they made their money. I don't like either Paul brother, so I only watched in hopes shorter, much-older Tyson would make Jake look as foolish as he is.
A friend and I, in separate states, found that it wouldn’t stream from TVs, Roku, etc. but would stream from mobile. And for me, using a mobile hotspot to a laptop; though that implies checking IP address range instead of just user-agent, so that seems unlikely.
Anyway, I wouldn’t be surprised if they were prioritizing mobile traffic because it’s more forgiving of shitty bitrate.
Comments on forums do not provide that data. And if you want to extrapolate self-reports, it's obviously fine (to varying degrees) for the vast majority of people, but that's not the "issue."
These kind of reports are the equivalent of saying "I have power" when you're hundreds of miles away from where a hurricane landed. It's uninteresting, it's likely you have power, and it does literally nothing for people that do not have power.
It doesn't advance the topic anywhere. There are other places to report these issues directly and in aggregate with other people -- HN (and many other forums) are not that place.
You butted into a conversation to tell someone their contribution added no value without adding anything constructive. A comment of “your comment is useless” is pure aggression and is ironically even less useful than the one it’s deriding.
> Tone however, matters in any context.
You are getting upset because someone used a swear word. You’ll find that is just deep seated classism and working on that will let you have much more fulfilling interactions.
Tone policing never works. It’s a waste of calories and everyone’s time.
For the same reason that pointing out the sun rose in the east today would be ridiculous but if it happened to rise in the west, or you perceived it to rise in the west, that would be worth sharing.
Being able to livestream a sporting event is the default now and has been for at least over a decade since HBO’s original effort to stream a Game of Thrones season opener failed because of the MSFT guy they hired, and they fixed it by handing it over to MLBAM.
Maybe that’s what Netflix should do. Buy Disney so they can get access to the MLBAM people (now Disney Streaming because they bought it from MLB).
It probably depends more on the ISP than on Netflix. Engineers over in my ISP’s subreddit are talking about how flows from Netflix jumped by over 450Gb/s and it was no big deal because it wasn’t enough to cause any congestion on any of their backbone links.
On a tangential note, the match totally looked fixed to me - Tyson was barely throwing any punches. I understand age is not on his side, but he looked plenty spry when he was ducking, weaving and dodging. It seemed to me he could have done better in terms of attacking as well.
I would argue Tyson has a shorter reach, Jake was whiffing a lot of superman punches, and all that does is waste energy. Jake might be able to throw punches, but he clearly wasn't interested in taking them. If they stood closer and slugged it out, the fight could have gone either way.
Yeah the biggest thing to me, and the commentators mentioned this as well, his legs looked REALLY wobbly.
All your attacking power comes from your legs and hips, so if his legs weren’t stable he didn’t have much attacking power.
I think he gave it everything he had in rounds 1 and 2. Unfortunately, I just don’t think it was ever going to be enough against a moderately trained 27 year old.
I wrote an analysis on doing this kind of unicast streaming in cable networks a decade ago. For edge networks with reasonable 100gig distribution as their standard, these would see some of the minor buffering issues.
There is a reason that cable doesn’t stream unicast and uses multicast and QAM on a wire. We’ve just about hit the point where this kind of scale unicast streaming is feasible for a live event without introducing a lot of latency. Some edge networks (especially without local cache nodes) just simply would not have enough capacity, whether in the core or peering edge, to do the trick.
Saw an Arista presentation about the increase in SFP capacity, it's Moore law style stuff. Arm based kit has a shockingly efficient amount of streams-per-watt too.
I can't see traditional DVB/ATSC surviging much beyond 2040 even accounting for the long tail.
You're right that large scale parallel live streams has only become feasible in the last few years. The BBC has some insights in how the BBC had to change their approach to scale to getting 10 million in 2021, having had technical issues in the 3 million range in 2018
Personally I don't think the latency is solved yet -- TV is slow enough (about 10 seconds from camera to TV), but IP streaming tends to add another 20-40 seconds on top of that.
That's no good when you're watching the penalties. Not only will your neighbours be cheering before you as they watch on normal TV, but even if you're both on the same IPTV you may well 5 seconds of difference.
The total end-to-end time is important too, with 30 seconds the news push notifications, tweets, etc on your phone will come in before you see the result.
> Saw an Arista presentation about the increase in SFP capacity, it's Moore law style stuff.
SFP itself isnt much the issue. Serdes is, and then secondarily the operating power envelope for the things (especially for the kinds of optics that run hot). Many tradeoffs available.
>I can't see traditional DVB/ATSC surviging much beyond 2040 even accounting for the long tail.
Tend to agree in well-developed infra, but rural and poorly-developed are well served with more traditional broadcast. Just saying “starlink!” 5 times in a dark bathroom won’t fix that part.
> Personally I don't think the latency is solved yet -- TV is slow enough (about 10 seconds from camera to TV), but IP streaming tends to add another 20-40 seconds on top of that.
I dont think it will get better. Probably worse, but with net better service quality. HLS/DASH are designed for doing the bursty networking thing. Among good reasons for this, mobile works much better in bursts than strict linear streams, segment caching is highly effective, etc.
But I think this makes sense: its a server-side buffering thing that has to happen. So assuming transmuxing (no transcoding lol) and wire latency are 0, we’re hitting the 1-5 seconds for the segment, probably waiting for a fill of 10 seconds to produce the manifest, then buffering client-side another 10 or so. Throw in more cache boxes and it’ll tick up more. It is quite high, but aside from bookies, i dont know how much people will actually care vs complain.
I think they must be noticing the issues, because I've noticed they've been dropping the stream quality quite substantially... It's a clever trick, but kind of cheap to do so, because who wants to watch pixelated things?
To be brutally honest if it’s a choice between pixelated and constantly buffering, pixelated is way less bad. Constantly buffering is incredibly annoying during live sports. (but this doesn’t negate your main point which is that if people paid to watch they expect decent resolution)
Yes, as I have said again and again on hacker news in different comments Netflix went overboard with their microservices and tried to position itself as a technological company when it's not. It has made everything more complex and that's why any Netflix tech blog is useless because it is not the way to build things correctly.
To understand how to do things correctly look at something like pornhub who handle more scale than Netflix without crying about it.
The other day I was having this discussion with somebody who was saying distributed counter logic is hard and I was telling them that you don't even need it if Netflix didn't go completely mental on the microservices and complexity.
Fastly says they do 6M ccv for superbowl (i'm actually surprised they let them do the entire thing and don't mix different CDNs) and I'm not sure they do encoding and manifest serving - they might just cache/deliver chunks. Do you really think tyson vs other guy was only 600k ccv? I'd be shocked if Netflix can't handle this.
You would think, but technology always finds a way to screw things up. Cox Communications has had ongoing issues with their video for weeks because of Juniper router upgrades and even the vendor can't fix it. They found this out AFTER they put it in production. Shit happens.
I don't understand why the media is pushing this a Jake Paul vs Mike Tyson stuff so hard and why people care about it. Boxing is crude entertainment for low intelligence people.
I'm tired of all this junk entertainment which only serves to give people second-hand emotions that they can't feel for themselves in real life. It's like, some people can't get sex so they watch porn. People can't fight so they watch boxing. People can't win in real life so they play video games or watch superhero movies.
Many people these days have to live vicariously through random people/entities; watch others live the life they wished they had and then they idolize these people who get to have everything... As if these people were an intimate projection of themselves... When, in fact, they couldn't be more different. It's like rooting for your opponent and thinking you're on the same team; when, in fact, they don't even know that you exist and they couldn't be more different from you.
You're no Marvel superhero no matter how many comic books you own. The heroes you follow have nothing to do with you. Choose different heroes who are more like you. Or better; do something about your life and give yourself a reason to idolize yourself.
Does anyone have any thoughts besides "bad engineering" on what could've gone wrong? It seems like taking on a new endeavor like streaming an event that would possibly draw many hundreds of millions of viewers doesn't make sense. Is there any obvious way that this would just work, or is there obviously a huge mistake deeply rooted in the whole thing. Also, are there any educated guesses on some fine details in the codebase and patterns that could result in this?
I believe HN's algorithm tends to relatively downrank stories with a high comment-to-upvote ratio, because they are more often flamewars on divisive topics.
Another major algorithmic down-ranker is vote wars on comments.
If lots of people are upvoting and downvoting the same comments, that's treated as a signal the topic is contentious and people are likely to start behaving badly.
HN is very clear they prioritize good behavior as the long term goal, and they are as a result comfortable having contentious topics fall in the ranking even if everyone involved in the discussion feels the topic is important.
Mine is glitchy, but if I refresh i get a good steam for a bit, then it gets low res, then freeze. If I wait for auto-reconnect it takes forever. Hard refresh and I'm good. Like, new streams to new server, then overloaded, then does as if their cluster is crashing and healing is rapid cycles. Sawtooth patterns on their charts.
And then all these sessions lag, or orphan taking up space, so many reconnections at various points in the stream.
System getting hammered. Can't wait for this writeup.
The arrogant Netflix! They always brag about how technologically superior they are, and they can't handle a simple technological challenge! I didn't have a buffering issue, I had an error page - for hours! Yet, they kept advertising the boxing match to me! What a joke! If you can't stream it, don't advertise it to save face with people like me who don't care about boxing!
Every organization makes mistakes and every organization has outages. Netflix is not different. Instead, of bashing them because they are imperfect, you might want to ask what you can learn from this incident. What would you do if your service received more traffic than expected? How would you test your service so you can be confident it will stay up?
Also, I have never seen any Netflix employees who are arrogant or who think they are superior to other people. What I have seen is Netflix's engineering organization frequently describes the technical challenges they face and discusses how they solve them.
I think you’re oversimplifying it. Live event streaming is very different from movie streaming. All those edge cache servers become kinda useless and you start hitting peering bottlenecks.
Edge caches are not useless for live streaming. They're critical. The upstream from those caches has no way of handling each individual users. The stream needs to hit the edge cache and end users should be served from there.
A typical streaming architecture is multi-tiered caches, source->midtier->edge.
We don't know what happened but it's possible they ran out of capacity on their edge (or anywhere else).
BBC had a similar issue in a live stream 5 years ago where events conspired and a CDN "failed open", which effectively DOSsed the entire output via all CDNs
> Even though widely used, this pattern has some significant drawbacks, the best illustration being the major incident that hit the BBC during the 2018 World Cup quarter-final. Our routing component experienced a temporary wobble which had a knock-on effect and caused the CDN to fail to pull one piece of media content from our packager on time. The CDN increased its request load as part of its retry strategy, making the problem worse, and eventually disabled its internal caches, meaning that instead of collapsing player requests, it started forwarding millions of them directly to our packager. It wasn’t designed to serve several terabits of video data every second and was completely overwhelmed. Although we used more than one CDN, they all connected to the same packager servers, which led to us also being unable to serve the other CDNs. A couple of minutes into extra time, all our streams went down, and angry football fans were cursing the BBC across the country.
This feels like a bug in the implementation and not really a drawback of the pattern. "Routing component experienced a temporary wobble" also sounds like bug of sorts.
I worked in this space. All these potential failure modes and how they're mitigates is something that we paid a fair amount of attention to.
Hopefully they fix it because they are hosting two Christmas NFL games this year and if you want to really piss people off you have buffering issues during NFL games lol.
After a few buffering timeouts during the first match, the rest of the event had no technical difficulties (in SoCal, so close to one of Netflix's HQs).
Unfortunately, except for the women's match, the fights were pretty lame...4 of the 6 male boxers were out of shape. Paul and Tyson were struggling to stay awake and if you were to tell me that Paul was just as old as Tyson I would have believed it.
Assuming Netflix used its extensive edge cache network to distribute the streams to the ISPs. The software on the caching servers would have been updated to be capable of dealing with receiving and distributing live streamed content, even if maybe the hardware was not optimal for that (throughput vs latency is a classic networking tradeoff).
Now inside the ISPs network again everyting would probably be optimized for the 99.99% usecase of the Netflix infra: delivering large bulk data that is not time sensitive. This means very large buffers to shift big gobs of packets in bulk.
As everything along the path is trying to fill up those buffers before shipping to the next router on the path, some endpoints aware this is a live stream start cancelling and asking for more recent frames ...
Why do they want to get into the live business? It doesn't seem to synergize with their infrastructure. Sending the same stream in real time to numerous people just isn't the same task as letting people stream optimized artifacts that are prepositioned at the edge of the network.
Most PPV is what, $50-$70? So subscribing to Netflix for $20 or whatever per month sounds like a bargain for anyone who is interested and not already a customer. Then assume some large percentage doesn’t cancel either because they forgot, or because they started watching a show and then decided to keep paying.
The marginal cost to add a viewer to broadcast sports is zero! That's what I am getting at. I know why someone would want this business, I just don't see what aspect of Netflix's existing business made them think they'd be good at it.
Live is the only thing that won’t be commodified entirely. “Anyone” can pump out stream-when-you-want TV shows. Live events are generally exclusive, unpredictable, and cultural moments .
I watched on an AppleTV and the stream was rock solid.
I don’t know if it’s still the case, but in the past some devices worked better than others during peak times because they used different bandwidth providers. This was the battle between Comcast and Cogent and Netflix.
Remember back in 2014 or so when Netflix users on Comcast were getting slow connections and buffering pauses? It didn't affect people who watched Netflix via Apple TV because Netflix served Apple TV users with a different network.
> In a little known, but public fact, anyone who is on Comcast and using Apple TV to stream Netflix wasn’t having quality problems. The reason for this is that Netflix is using Level 3 and Limelight to stream their content specifically to the Apple TV device. What this shows is that Netflix is the one that decides and controls how they get their content to each device and whether they do it via their own servers or a third party. Netflix decides which third party CDNs to use and when Netflix uses their own CDN, they decide whom to buy transit from, with what capacity, in what locations and how many connections they buy, from the transit provider. Netflix is the one in control of this, not Comcast or any ISP.
> I watched on an AppleTV and the stream was rock solid.
For me it was buffering and low resolution, on the current AppleTV model, hardwired, with a 1Gbps connection from AT&T. Some streaming devices may have handled whatever issues Netflix was having better than others, but this was clearly a bigger problem than just the streaming device.
I thought Netflix’s biggest advantage was the quality/salary of its engineers.
I think that every time I wait for Paramount+ to restart after its gone black in picture on picture, and yet, I’n still on Paramount+ and not Netflix, so maybe that advantage isn’t real.
Sigh, none of the competitors are much better. Disney, who has more than enough cash to throw at streaming, is a near constant hassle for us ( after 3 or more episodes it throws an inscrutable error on Playstation ). I would drop it, but this is the only remaining streaming service and wife is not willing to drop it ( I guess until 1 it is one error per one episode ).
Eh, the beef that I have is that I am already a paying customer. Why does it seem like I am getting subpar service in terms of delivery? I know it is a tired conversation on this forum, but corporations big and small do what they can do mess with experience to eke out few more cents from customers. It almost does not matter which industry one looks at, it is the same story; the only difference is typically how egregious things get[1].
I think this was true at some point, but I’ve been disappointed in the quality of the OSS Netflix tools recently. I think before k8s and a plethora of other tools matured, they were way ahead of the curve.
I specifically found the Netflix suite for Spring very lacking, and found message oriented architectures on something like NATS a lot easier to work with.
It was so bad. So so bad. Like don’t use your customers as guinea pigs for live streaming. So lame. They need a new head of content delivery. You can’t charge customers like that and market a massive event and your tech is worse than what we had from live broadcast tv.
i thought they did DSA interviews at netflix what happened? I had to watch the fight on someone streaming to X from their phone at the event and it was better than watching on netflix..if you could watch at all. extremely embarrassing!
My theory is they've so heavily optimized for static content and distributing content on edge nodes that they were probably poorly setup for live-streaming.
This, I feel bad for their engineers who were told every bonus would be a matter of how low can they get the cost-per-GB of transferred data, leading to the glorious Netflix-in-a-box (https://openconnect.netflix.com/deploymentguide.pdf) and then management casually asks for a live show with a wildly different set of guarantee requirements.
I never hear about it anymore. Is that because everyone wants to watch something different at their own time? Or is it actually working just fine now in the background? I see under the "Deployment" section it mentions IPTV in hotel rooms.
dumbing it down a bit: Imagine if anyone in your neighborhood could broadcast video and take up %N of the bandwidth to all of the routers in the neighborhood. Imagine this on your campus or at your office. This works for cable tv, as there's only 200 channels. You're just going to slurp up all of the bandwidth instead and maybe no one is watching the tv.
Sure you get these black swan events that everyone wants to watch, but they're just that, really infrequent. So instead you have to provision capacity if on the interent to do big events like this. The upside is that you have a billion people able to point to point to another billion people instead of 30 companies who can send to everyone.
One similar crash I remember very well was CNN on 9/11 - I tired to connect from France but is down the whole day.
Since then I am very used to it because our institutional web sites traditionally crash when there is a deadline (typically the taxes or school inscriptions).
As for that one, my son is studying in Europe (I am also in Europe), he called me desperate at 5 am or so to check if he is the only one with the problem (I am the 24/7 family support for anything plugged in). After having liberally insulted Netflix he realized he confirmed with his grandparents that he will be helping them at 10 :)
They should have partnered with every major CDN and load balanced across all of them. It’s ironic how we used to be better at broadcasting live events way back in the day versus today.
I watched the event last night and didn't get any buffering issues, but I did notice frequent drop in video quality when watching the live feed. If I backed the video up a bit, the video quality suddenly went back up to 4k.
I had some technical experience with live video streaming over 15 years ago. It was a nightmare back then. I guess live video is still difficult in 2024. But congrats to Jake Paul and boxing fans. It was a great event. And breaking the internet just adds more hype for the next one.
I wonder how localized the issues were. I watched the Taylor/Seranno fight and the Paul/Tyson without issue and the picture quality was the best in every seen for live sports. Was blown away by how good it was. No where near what I’m getting with steaming NFL. This is what I want the future of live sports to look like. Though the commentary was so so.
I’m in the Pacific Northwest. I wonder if we got lucky on this or just some areas got unlucky.
You're lucky you only had some buffering issues. This is got the case for many people, I don't know the percentage, but many people on reddit were complaining.
If you're going to be having intense algorithm interviews, paying top dollar for only hiring senior engineers, building high intensity and extreme distributed systems and having SRE engineers, we best see insanely good results and a high ROI out of it.
All of the conditions was perfect for Netflix, and it seems that the platform entirely flopped.
Is this what chaos engineering is all about that Netflix was marketing heavily to engineers? Was the livestream supposed to go down as Netflix removed servers randomly?
It seemed to be some capacity issue with the CDNs. When I stopped and restarted the stream it worked again. Perhaps they do not use real time multi-cdn switching.
What a massive blow to NFLX. They have been in the streaming game for years (survived COVID-19) and this silly exhibition match is what does them in?
I didn’t watch it live (boxing has lot its allure for me) but vicariously lived through it via social feed on Bluesky/Mastadon.
Billions of dollars at their disposal and they just can’t get it right. Probably laid off the highly paid engineers and teams that made their shit work.
It's not a "massive blow" at all. Consumers will only vaguely remember this in a month. Netflix got a lot of new signups and got to test out their streaming infrastructure to figure out what needs work.
The fight itself was lame which worked in their favor. No one really cared about not being able to see every second of the "action". It's not like it was an NBA game that came down to the last second.
A see in the comments multiple people talking about how "cable" companies who have migrated to IPTV has solved this problem.
I'd disagree.
I'm on IPTV and any major sporting event (World Series, Super Bowl, etc) is horrible buffering when I try to watch on my 4K IPTV (streaming) channel. I always have to downgrade to the HD channel and I still occasionally experience buffering.
Technical issues happen, but I wish they would've put up a YouTube stream or something (or at least asked YouTube to stop taking down the indie streams that were popping up). It seems like basically their duty to the boxers and the fans to do everything in their power to let the match be seen live, even if it means eating crow and using another platform.
Honestly you didn't miss much, every (real) boxing fan thought of this as a disgrace and a shame when announced. putting a 58 year old Tyson against a crackhead filled with steroids (Jake Paul) ? Either case it would have been a shame on Jake Paul for even getting in the ring with such an old boxer.
In boxing you are old by 32 or maybe 35 year old for heavy weight, and everything goes down very very fast.
OK, but the last time they tried a livestream (a reality show reunion), it also fell over. I suppose to their credit, my stream never outright died yesterday, it just went to potato quality.
This wasn’t a “real” fight in the ring. It was clearly a hype/money fight only. The late 20 year old boxer has a massive following (or hate following) with younger age groups; and Mike Tyson brings the older age groups out. Mike has earned somewhat of a legendary status.
Leading up to the fight, there were many staged interactions meant to rile up the audience and generate hype and drive subscription revenue, and subsequently make ad spots a premium ($$$).
Unfortunately, American television/entertainment is so fake. I can’t get even be bothered to engage or invest time into it anymore.
I'm sure the architecture and scale of NetFlix's operations is truly impressive, but stories like this make me further appreciate the elegant simplicity of scalability of analogue terrestrial TV, and to a similar extent, digital terrestrial TV and satellite.
We all know netflix was built for static content, but its still hilarious that they have thousands of engineers making 500-1M in total comp and they couldnt live stream a basic broadcast. You probably could have just run this on AWS with a CDK configuration and quota increase from amazon
On X.com someone had a stream that was stable to at least 5 million simultaneous viewers, but then (as I expected) someone at Netflix got them to pull the plug on it. So I would expect this fight to have say, 50 million + watching? Maybe as many as 150-250 million worldwide, given this is Tyson's last fight.
Woke up at 4am (EU here), to tune for the main event. Bought Netflix just for this. The women fight went good, no buffering, 4K.
As it approached the time for Paul vs Tyson, it started to first drop to 140p, and then constantly buffer. Restarted my chromecast a few times, tried from laptop, and finally caught a stream on my mobile phone via mobile network rather than my wifi.
The TV Netflix kept blaming my internet which kept coming back as “fast”.
Ended up watching the utterly disappointing, senior abuse, live stream on my mobile phone with 360p quality.
Gonna cancel Netflix and never pay for it it again, nor watch hyped up boxing matches.
Internet live streaming is harder than cable tv sattelite live streaming over "dumb" TV boxes cable. They should not have used internet for this honestly. A TV signal can go to millions live.
All these engineering blog posts, distributed systems and these complex micro-services clearly didn't help with this issue.
Netflix is clearly not designed nor prepared for scalable multi-region live-streaming, no matter the amount of 'senior' engineers they throw at the problem.
All your corporate culture, comp
Structure, Interview process etc etc is all so much meta if you can’t deliver. They showed they can’t deliver. Huge let down.
I thought it's only the best of the best of the best working at Netflix ... or maybe we can just put this myth to sleep that Netflix even knows what it's doing. The suggestions are shit, the UX is shit, apparently even the back end sucks.
There was some blog post on HN the other day where someone said they don't do chaos monkey anymore... Even then, how do you chaos test a novel event ahead of time?
I would have just made it simple, delay the live stream a few seconds and encode it into the same bucket where users already is playing static movies. Just have the player only allow start at the time everyone is at.
I ended up turning my TV off and watching from my phone because of the buffering/freezing. The audio would continue to play and the screen would be frozen with a loading percentage that never changed.
I have Spectrum (600 Mbps) for ISP and Verizon for mobile.
Did anyone else see different behaviour with different clients? My TV failed on 25% loaded, my laptop loaded but played for a minute or two before getting stuck buffering, and my iphone played the whole fight fine. All on the same wifi network.
From my limited understanding, the NFL heavily depends on the Netflix Open Connect platform to stream media to edge locations, which is different from live streaming. Probably, they over-pushed the HD contents.
This livestream broke the internet, no joke. youtube was barely loading and a bunch of other sites too. 130M is a conservative number given all the pirate streams.
I'm watching the event as I'm writing this. I've been needing to exit the player and resume it constantly. Pretty surprising that Netflix hasn't weeded out these bugs.
I couldn’t watch a show a couple days ago. Long time customer, and first time I’ve considered cancelling. Broke the basic contract of I give $ and Netflix give show.
I did some VPN hopping and connecting to an endpoint in Dallas has allowed me to start watching again. Not live though, that throws me back into buffering hell.
Well he is almost 60 and the average retirement age for pro boxers is in the mid 30s. He is well past his prime and in physically demanding sport that is very hard on participants.
I clicked on this thread to type that exact thing, holy smokes.
You're referring to Hooli's streaming of UFC fight that goes awry and Gavin Belson totally loses it, lol. Great scene and totally relevant to what's happening with Netflix rn.
I have to assume this is some snarky way of saying "violating copyright is Bad, m'kay".
Because taken at face value it's false. Any technical challenges involved in distributing a stream cannot possibly be affected by the legal status of the bits being pushed across the network.
It means that someone who spends 100% of their money on distribution is going to have an easier time than someone who pays for content and distribution.
> I'm an engineering manager at a Fortune 500 company. The dumbest engineer on our team left for Netflix. He got a pay raise too.
Apparently he was smart enough to get away from the Fortune 500 company he worked at, reporting to yourself, and "got a pay raise too."
> Our engineers are fucking morons. And this guy was the dumbest of the bunch.
See above.
> If you think Netflix hires top tier talent, you don't know Netflix.
Maybe you don't know the talent within your own organization. Which is entirely understandable given your proclamation:
Our engineers are fucking morons.
Then again, maybe this person who left your organization is accurately described as such, which really says more about the Fortune 500 company employing him and presumably continues to employ yourself.
IOW, either the guy left to get out from under an EM who says he is a "fucking moron" or he actually is a "fucking moron" and you failed as a manager to elevate his skills/performance to a satisfactory level.
Managers aren't teachers. They can spend some time mentoring and teaching but there's a limit to that. I've worked with someone who could not write good code and no manager could change that.
Most people I've worked with aren't like that of course (there's really only one that stands out), so maybe you've just been lucky enough to avoid them.
I do find it unlikely that all of his engineers are morons, but on the other hand I haven't worked for a typical fortune 500 company - maybe that's where all the mediocre programmers end up.
White-Knighting for 'fucking morons' is not a good look though. You'll end up in a world where packets of peanuts have a label on saying 'may contain nuts'.
I think acting as if peanuts are actually nuts for purposes of communication is much more defensible than acting as if tomatoes are vegetables, in short you are dying on a hill that was paved over long ago.
I agree most people will conflate them, but someone who's allergic to peanuts but not tree nuts (or vice versa), i.e. the people the labels are intended for, are going to care about the difference.
> or he actually is a "fucking moron" and you failed as a manager to elevate his skills/performance to a satisfactory level.
sometimes managers don't have the authority to fire somebody and are forced to keep their subordinates. Yes good managers can polish gold, but polishing poop still results in poop.
I was consulting at a place, there was a very bad programmer whose code looked sort of like this
cost arrayIneed = [];
const arrayIdontNeed = firstArray.map(item => {
if(item.hasProp) {
arrayIneed.push(item);
}
});
return arrayIneed;
the above is very much a cleaned up and elegant version of what he would actually push into the repo.
he left for a competitor in the same industry, this was at the second biggest company for the industry in Denmark and he left for the biggest company - presumably he got a pay raise.
I asked the manager after he was gone, one time when I was refactoring some code of his - which in the end just meant throwing it all out and rewriting from scratch - why he had been kept on so long, and the manager said there were some layoffs coming up and he would have been out with those but because of the way things worked it didn't make sense to let him go earlier.
> the manager said there were some layoffs coming up and he would have been out with those but because of the way things worked it didn't make sense to let him go earlier.
Incentives are fucked across the board right now.
Move on a low performer today and you'll have an uphill battle for a backfill at all. If you get one, many companies are "level-normalizing" (read: level-- for all backfills). Or perhaps your management thinks the job could be done overseas cheaper, or you get pushed to turn it into a set of tasks so you can farm it out to contractors.
So you keep at least some shitty devs to hold their positions, and as ballast to throw overboard when your bosses say "5% flat cut, give me your names". We all do it. If we get back to ZIRP I'll get rid of the actively bad devs when I won't risk losing their position entirely. Until then, it's all about squeezing what limited value they have out and keeping them away from anything important.
Hmm. Engineering managers should be setting the team culture and determining best criteria for extending an offer to a candidate. If theres a problem with the hiring process I'd look for the closest source that could or should be fixing it.
I hope to never have a manager who is mentally stack ranking me and my coworkers in terms of perceived dumbness instead of in terms of any positive trait.
Almost everyone I know manager or not is usually ranking everyone they work with on various attributes.
In fact it would be incredibly weird to ask a close friend who at their work kicks ass and who sucks and have them respond back, "I've never really thought about how good any of my coworkers were at their jobs"
dumbness is ranking intelligence, which is a positive trait, dumbness is just a metric for how often intelligence fails.
Example - the manager who started this sub-thread may be a pretty smart guy and able to accurately rate the intelligence of the engineers at his organization - but he had a minor momentary failing of intelligence to post on HN calling those engineers fucking morons.
You've got to rank how often the intelligence fails in someone to be able to figure out how reliable their intelligence is.
I'm not a manager and I don't stack rank people, but I am 100% capable of knowing when one of my co-workers or predecessors is a fucking moron.
The trick is to use my massive brain to root cause several significant outages, discover that most of them originate in code written by the same employee, and notice that said employee liked to write things like
// @ts-nocheck
// obligatory disabling of typescript: static typing is hard, so why bother with it?
async function upsertWidget() {
try {
// await api.doSomeOtherThings({ ... })
// 20 line block of commented-out useless code
// pretend this went on much longer
let result = await api.createWidget({ a, b, c })
if (!result.ok) {
result = await api.createWidget({ a, b }) // retries for days! except with different args, how fun
if (!result.ok) {
result = await api.updateWidget({ a, b, c }) // oh wait, this time we're updating
}
}
// notice that api.updateWidget() can fail silently
// also, the three function calls can each return different data, I sure am glad we disabled typescript
return result
} catch (error) {
return error // I sure do love this pattern of returning errors and then not checking whether the result was an error or the intended object
}
}
function doSomething() {
const widget = await upsertWidget()
}
...except even worse, because instead of createWidget the name was something far less descriptive, the nesting was deeper and involved loops, there were random assignments that made no goddamn sense, and the API calls just went to an unnecessary microservice that was only called from here and which literally just passed the data through to a third party with minor changes. Those minor changes resulted in an internal API that was actually worse than the unmodified third party API.
I am so tired of these people. I am not a 10x rockstar engineer and not without flaws, but they are just so awful and draining, and they never seem to get caught in time to stop them ruining perfectly fine companies. Every try>catch>return is like an icy cat hand from the grave reaching up to knock my coffee off my desk.
In this specific case, the fucking moron in question was the one who designed the code review process and hired the other engineers, and it took place a significant length of time before my involvement.
Which, yes, does raise interesting questions about how someone who can't be trusted to handle errors in an API call ended up in a senior enough position to do that.
There's a disincentive to actively block PRs if you don't want your coworkers to think you are a bad colleague / not on their side. So you often see suboptimal code making its way to production. This has a worse effect the more terrible engineers there are.
Except in this case it's clearly affecting at minimum the rest of OP's team.
At that point it's not one person being obnoxious and never approving their team members diffs and more of a team effort to do so.
But at minimum if you have a culture of trying to improve your codebase you'll inevitably set up tests, ci/cd with checks, etc. before any code can be deployed. Which should really take any weight of responsibility out of any one member of the team. Whether trying to put out code or reject said code.
I dunno, I've gone and done a "git blame" to find out who the fucking moron that wrote the code was, only to find out it was me three years ago.
Sure, there's such a thing as stupid code, but without knowing the whole context under which a bit of code was produced, unless it's utterly moronic, (which is entirely possible, dailywtf has some shiners), it's hard to really judge it. Hindsight, as applied to code, is 2020.
I agree with the general sentiment ("one instance of bad code might have been me") but not the specific sentiment ("I could easily decide to catch and ignore errors through every bit of code I worked on without knowing why that was bad, and commit other, similar crimes against good taste in the same way").
The difference for me is that this is pervasive. Yes, sometimes I might write code with a bug in error handling at 3am when I'm trying to bring a service back up, but I don't do it consistently across all the code that I touch.
I accept that the scope is hard to understand without having worked on a codebase which a genuine fucking moron has also touched. "Oh strken," you might say, "surely it can't be that bad." Yes, it can. I have never seen anything like this before. It's like the difference between a house that hasn't been cleaned in a week and a hoarder's house. If I tried to explain what hoarding is, well, maybe you'd reply that sometimes you don't do the dishes every day or mop the floor every week, and then I'd have to explain that the kitchen I'm talking about is filled from floor to roof with dirty dishes and discarded wrappers, including meat trays, and smells like a dead possum.
Hey, that possum's name was Roger and I'm really sad that it died. I've been feeding it for weeks! There are definitely bad programmers out there who's code is only suitable for public shaming via the daily wtf.
I've never seen a team that has somehow managed to hire exclusively morons. Even the shittiest of call center jobs and construction crews have a few people with enough brain cells to tie their shoelaces.
Have you considered that maybe you're being overly harsh about your co-workers? Maybe take the fact that one of them was hired by a top paying employer as a sign that you should improve your own ability to judge skill?
I've seen tons of them! The formula is to create conditions that will make even slightly competent people leave. They hire their morron nephew, he is always 30 min late then they moan when you are 5 min late because the parking lot was blocked by their car. He always leaves 2 hours early while you do overtime that they regularly forget to pay for. Your day is filled with your own work PLUS that of your retarded coworkers who only drink coffee while joking about you doing their work. You are not as fast as the last guy! haha! If something goes wrong the morrons collectively blame you, just like the last time. You get a formal warning. etc etc The other normal person they hire is let go after 2 days because they complaint which means they didnt fit the team.
And so on
If he still works there the morron who left was less of a.
Here it is somewhat normal to "forget" so that you have to ask for it every time. My current employer has thousands of employees. "Forgetting" is good business. If money is tight they have people ask twice. You get a response like, didnt you already report this? Surely someone is working on it?
Can’t speak for every place but that’s not always an option. As a teenager, I worked at Sports Direct where the management would regularly work us after our allotted hours and bar us putting the extra time onto our timesheet. If I recall correctly, the company eventually got pulled for it but the money they’d have saved over years would have outweighed the fine.
The timesheets were on paper so good luck putting your real hours on without your manager, who files it, finding out.
You’re asking children to have full understanding of their rights and how to enforce them. Also, investigations into this started in 2020: over a decade after I left. Do you think nobody had reported this in all that time? Looks like the system wasn’t working as well as you think it does.
Having worked with a bunch of guys who have gone on to "top teams", I no longer believe they have top teams. My fav was the guy who said the system could scale indefinitely after it literally fell on its ass from too much traffic. He couldn't understand that just because Lambdas my themselves can scale, they are limited by the resources they use, so just ignored it and insisted that it could. The same guy also kept on saying we should change the TPEG standard because he didn't like how it worked. And these companies are seriously pretending they've got the best and brightest. If that's really true, I really need to find another profession.
I've worked for many companies that said they hired the best. And to be honest when I hire I also try to hire good people. I think I could hire better if a) I had an open cheque, b) I was running coolest project in the universe. I did hire for some interesting projects but nothing close to an open cheque. Even under these conditions it's tough to find great people. You can go after people with a proven track record but even that doesn't always guarantee their next project will be as successful.
The reality though is that large companies with thousands of people generally end up having average people. Some company may hire more PhD's. But on average those aren't better software engineers than non-PhD's. Some might hire people who are strong competitive coders, but that also on average isn't really that strong of a signal for strong engineers.
Once you have a mix of average people, on a curve, which is the norm, the question becomes do you have an environment where the better people can be successful. In many corporate environments this doesn't happen. Better engineers may have obstacles put in front of them or they can forced out of the organization. This is natural because for most organizations can be more of a political question than a technical question.
Smaller organizations, that are very successful (so can meet my two criterias) and can be highly selective or are highly desirable, can have better teams. By their nature as smaller organizations those teams can also be effective. As organizations grow the talent will spread out towards average and the politics/processes/debt/legacy will make those teams less effective.
To be fair, when you need to hire hundreds or thousands of people, you gotta hire average people. The best is a finite resource and not all of the best want to work for FAANG or any megacorp.
I used to want to work at a FAANG-like company when I was just starting out thinking they were going to be full of amazing devs. But over the years, I've seen some of the worst devs go to these companies so that just destroyed that illusion. And the more you hear about the sort of work they do, it just sounds boring compared to startups.
I interviewed at Netflix a few years ago; with several of their engineers. One thing I cannot say is that they are morons.
their interview process is top notch too and while I was ultimately rejected, I used their format as the base template for how I started hiring at my company.
I don't have a dog in this fight, but you typically use your A players for hiring/interviews.
It can be both true that Netflix has God tier talent and a bunch of idiots. In fact, that's probably true of most places. I guess the ratio matters more.
This is every dev house I've worked at. For most people (mostly not the ones on HN), coding is a 9-5 job. No ambition. Just lines of code. Go home. I don't know there is anything particularly wrong with that.
You just have to accept most staff at any corporation are simply just average. There has to be an average for there to be better and worse.
If your "dumbest engineer" got a job and a hefty raise going to Netflix, it means he was very capable engineer who was playing the part of moron at this Fortune 500 company because he was reporting to a manager who was calling him and the entire team morons and he didn't feel the need to go above and beyond for that manager.
Also, highly likely that it was the manager that was the moron and not everyone around him.
In terms of healthcare industry in Hungary: one worker does the same job for 700 USD a month and another for 1100 USD, the only difference is formal education and years worked in the industry. You can perform much better (by actually caring about the patients in those 12 hours you work) but you will get paid the same amount regardless. Of course if you have 3 kids (whether they are adults or not) then you do not pay taxes (or much less than someone who does not have kids or only has 2).
They obviously have some really good engineers, but many low-tier ones as well. No idea how long they stay there, though.
I'm watching the fight now and have experienced the buffering issues. Bit embarrassing for a company that fundamentally only does a single thing, which is this. Also, yeah, 900k TC and whatnot but irl you get this. Mediocre.
Or in other words: In one case the "stream" is stored on a harddrive not far away from you, only competing for bandwidth in the last section to you. In the other case the "stream" is comming over the Internet to you and everyone else at the same time.
It mostly makes sense to me. From their bombastic blogs to github projects full of overwrought Enterprise java design patterns. The only thing great about Netflix is it pays a lot more.
What are the chances that your entire engineering team is entirely composed of low performers or people with bad attitude or whatever you designate as "fucking morons"?
It's more likely that you are bad at managing, growing and motivating your team.
Even if it was true, to refer to your team in this way makes you look like you are not ready for management.
Your duty is to get the most out of the team, and that mindset won't help you.
Don’t agree. Sometimes you can observe the world around you, and it’s not pretty. Are they not allowed to observe the truth as they see it? What if they are right?
Because the idea that all the engineers that work at his large company are morons is absurd. Anyone in that situation that believes that and even more, states it, is just making their own character flaws apparent.
It’s hyperbole, like a teacher complaining to others, “my kids were all crazed animals today.”
I’ve worked with engineers where I had to wonder how they found their computer every morning. I can easily see how a few of those would make you bitter and angry.
> Our engineers are fucking morons. And this guy was the dumbest of the bunch.
Very indicative of a toxic culture you seem to have been pulled in to and likely have contributed to by this point given your language and broad generalizations.
Describing a wide group of people you're also responsible for as "fucking morons" says more about you than them.
this is why managers get a bad rap. what proportion think like this? hopefully not a large one but i do worry. ultimately if the team sucks its because of the management. theyre the ones with the greatest power to change things for the better.
I'm going to avoid leaving a zero-effort response like "actually you're the problem" like half of the replies and contribute:
Why do you call your engineers morons? Is it a lack of intelligence, a lack of wisdom, a lack of experience, inability to meet deadlines, reading comprehension, or something else?
I wonder if Netflix is just hiring for different criteria (e.g. you want people who will make thoughtful decisions while they want people who have memorized all the leetcode problems).
I think this is a result of most software "engineering" having become a self-licking ice cream cone. Besides mere scaling, the techniques and infrastructure should be mostly squared away.
Yes, it's all complicated, but I don't think we should excuse ourselves when we objectively fail at what we do. I'm not saying that Netflix developers are bad people, but that it doesn't matter how hard of a job it is; it was their job and what they did was inadequate to say the least.
Every time a big company screws up, there are two highly informed sets of people who are guaranteed to be lurking, but rarely post, in a thread like this:
1) those directly involved with the incident, or employees of the same company. They have too much to lose by circumventing the PR machine.
2) people at similar companies who operate similar systems with similar scale and risks. Those people know how hard this is and aren’t likely to publicly flog someone doing their same job based on uninformed speculation. They know their own systems are Byzantine and don’t look like what random onlookers think it would look like.
So that leaves the rest, who offer insights based on how stuff works at a small scale, or better yet, pronouncements rooted in “first principles.”
I've noticed this amongst the newer "careerist" sort of software developer who is stumbling into the field for money, as opposed to the obsessive computer geek of yesteryear, who practiced it as a hobby. This character archetype is a transplant, say, less than five years ago from another, often non-technical discipline, and was taught or learned from overly simplistic materials that decry systems programming, or networking, or computer science concepts as unnecessary, impractical skills, reducing everything to writing JavaScript glue code between random NPM packages found on google.
Especially in a time where the gates have come crashing down to pronouncements of, "now anybody can learn to code by just using LLMs," there is a shocking tendency to overly simplify and then pontificate upon what are actually bewilderingly complicated systems wrapped up in interfaces, packages, and layers of abstraction that hide away that underlying complexity.
It reminds me of those quantum woo people, or movies like What the Bleep Do We Know!? where a bunch of quacks with no actual background in quantum physics or science reason forth from drastically oversimplified, mathematics-free models of those theories and into utterly absurd conclusions.
What does this have to do with the topic being discussed?
Because it is about people speculating on events that seem connected to their own experience, but in actuality aren’t, because they don’t understand the breadth of the distribution of the abstraction they are discussing.
This happens when your terms are underspecified: someone says “Netflix’s servers are struggling under load” and while people in similar efforts know that basically just equivalent to “something is wrong” and the whole conversation is basically esoteric to most people outside a few specialized teams, these other people jump to conclusions and start having conversations based on their own experience having to do with what is (to them) related (and usually fashionable, because that is how most smaller players figure out how to do things).
In short, people with glib answers tend to rely on over simplified models that don’t reflect reality.
Even before LLMs were trendy, at the time of covid 19, a lot of people surprisingly became "experts" on the matter of virology and genetics on social networks.
even for enthusiasts, to go from no programming experience to understanding how Netflix handles video streaming at scale would take more than 5 years.
Completely agreed. There are also former employees who have very educated opinions about what is likely going on, but between NDAs and whatnot there is only so much they are willing to say. It is frustrating for those in the know, but there are lines they can't or won't cross.
Whenever an HN thread covers subjects where I have direct professional experience I have to bite my tongue while people who have no clue can be as assertive and confidently incorrect as their ego allows them to be.
Some people can just let others be wrong and just stay silent, but some people can't help themselves. So if you say something really wrong, like this was caused by Netflix moving to Azure, they should have stayed on AWS! someone will come along to correct you. If you're looking for the right answer, post the wrong one, alongside some provoking statement (Windows is better than Linux because this works there), and you'll get the right answer faster than if you'd asked your question directly.
https://xkcd.com/386/
> Some people can just let others be wrong and just stay silent, but some people can't help themselves.
As one of thoes who cant help themselves; the way you phrase it feels a bit too cynical, I've always interpreted it as people want to help, but don't want to offer something that's wrong. Which is basically how falsifiable science works. It's so much easier to refute the assertion that birds generate lift with tiny backpacks with turboprops attached. Than it is to explain the finer details of avian flight mechanics. I couldn't describe above a superficial level how flapping works, but I can confidently refute the idea of a turboprop backpack. (Everyone knows birds gave up the turboprop design during the great kerosene shortage of 1128)
It depends on the medium and the cost of looking like an idiot. On the Internet where some tosser is going to call you names anyway? Saying dumb shit to nerdsnipe someone else to do hours of research and write an essay on it for you, at the expense of them calling you an idiot, is cheap, and easier than doing all that work yourself. Meanwhile, at work, I'm the one getting nerd sniped into doing a bunch of extra work.
nods knowingly
Right? A common complaint by outsiders is that Netflix uses microservices. I'd love to hear exactly how a monolith application is guaranteed to perform better, with details. What is the magic difference that would have ensured the live stream would have been successful?
I am one of the ones who complain about their microservices architecture quite a lot.
This comes from both first-hand experience of talking to several of their directors when consulted upon on how to make certain systems of theirs better.
It's not just a matter of guarantees, it's a matter of complexity.
Like right now Google search is dying and there's nothing that they can do to fix it because they have given up control.
The same thing happened with Netflix where they wanted to push too hard to be a tech company and have their tech blogs filled with interesting things.
On the back end they went too deep on the microservices complexity. And on the front end for a long time they suffered with their whole RxJS problem.
So it's not an objective matter of what's better. It's more cultural problem at Netflix. Plus the fact that they want to be associated with "Faang" and yet their product is not really technology based.
Google search is dying because of business reasons, not technical ones. The ads branch is actively cannibalizing search quality to make people perform more searches and view more ads.
"Microservices" have nothing to do with it.
You can explain these problems with simple business metrics that technologists like to ignore. Right before the recent Twitter acquisition, the various bits of info that came to the limelight included the "minor detail" that they had more than doubled their headcount and associated expenses, but had not doubled either their revenue or profits. Technology complexity went up, the business went backwards. Thousands of programmers doesn't always translate to more value!
Netflix regularly puts out blog articles proudly proclaiming that they process exabytes of logs per microsecond or whatever it is that their microservices Rube Goldberg machine spits out these days, patting themselves on the back for a heroic job well done.
Meanwhile, I've been able to go on the same rant year after year that they're still unable to publish more than five subtitle languages per region. These are 40 KB files! They had an employee argue with me about this in another forum, saying that the distribution of these files is "harder than I thought".
It's not hard!
They're solving the wrong problems. The problems they're solving are fun for engineers, but pointless for the business or their customers.
From a customer perspective Netflix is either treading water or noticeably getting worse. Their catalog is smaller than it was. They've lost licensing deals for movies and series that I want to watch. The series they're producing themselves are not things I want to watch any more. They removed content ratings, so I can't even pick something that is good without using my phone to look up each title manually!
Microservices solve none of these issues (or make it worse), yet this is all we hear about when Netflix comes up in technology discussions. I've only ever read one article that is actually relevant to their core business of streaming video, which was a blog about using kTLS in BSD to stream directly from the SSD to the NIC and bypassing the CPU. Even that is questionable! They do this to enable HTTPS... which they don't need! They could have just used a cryptographic signature on their static content, which the clients can verify with the same level of assurance as HTTPS. Many other large content distribution networks do this.
It's 100% certain that someone could pretend to be Elon, fire 200-500 staff from the various Netflix microservices teams and then hire just one junior tech to figure out how to distribute subtitles... and that would materially improve customer retention while cutting costs with no downside whatsoever.
> Right before the recent Twitter acquisition, the various bits of info that came to the limelight included the "minor detail" that they had more than doubled their headcount and associated expenses, but had not doubled either their revenue or profits.
Every tech company massively inflated their headcount during the leadup to the Twitter acquisition because money was free.
I interviewed at Meta in 2021 and asked an EM what he would do if given a magic wand to fix one problem at the company. His response: "I would instantly hire 10,000 more engineers."
Elon famously did the opposite and now revenue is down 80%.
>"I would instantly hire 10,000 more engineers."
From my experience, this answer usually belies someone who doesn’t fully understand the system and problems of the business. The easy answer when overwhelmed is “we need more people.” To use a manufacturing analogy, you can cover up a lot of quality issues with increased throughout, but it makes for an awfully inefficient system.
Twitter revive collapsed because of politics and the public removal of moderators, not be side of how many engineers were employed.
Money was not "free".
Borrowing costs went to nearly zero. That's not the same thing. You have to repay the money, you just don't have to repay it with interest.
I would have assumed people generally know this, but everybody (and I do mean everybody) talks like they don't know this. I would like to assume that "money is free" is just a shorthand, buuuut... again... these arguments! People like that EM talk like it was literally free money raining from the sky that could be spent (gone!) without it ever having to be repaid.
If you watched any of the long-form interviews Musk gave immediately after the acquisition, he made the point that if he hadn't bailed out Twitter, it had maybe 3 months of runway left before imploding.
Doubling headcount without a clear vision of how that would double revenues is madness. It is doubly so in orgs like Twitter or Netflix where their IT was already over-complicated.
It's too difficult for me to clearly and succinctly explain all the myriad ways in which a sudden inrush of noobs -- outnumbering the old guard -- can royally screw up something that is already at the edge of human capability due to complexity. There is just no way in which it would help matters. I could list the fundamental problems with that notion for hours.
Companies weren’t issuing debt to pay for headcount. The reason market interest rates matter is that when interest rates are low, your company stock doesn’t have to have high returns to get investment. When these conditions exist, companies feel safer hiring people to invest in growth instead of saving to provide high shareholder returns.
I highly recommend everyone take a university-level financial instruments course. The math isn’t super hard, and it does a very good job of explaining how rational investors behave.
So you’re saying the investors are happy to see their money set on fire?
Surely they expect at a minimum that their capital investment would make them dividends (increased revenue), and also that the money wasn’t simply set on fire with nothing to show for it and no way to repay it.
If I’m wrong then Twitter - and similar companies - are little better than Ponzi schemes, with investors relying on the money of the greater fool to recover their money.
> So you’re saying the investors are happy to see their money set on fire?
Ah, HN, where you try to explain how things work, and you get ignorant sarcasm in return.
> Surely they expect at a minimum that their capital investment would make them dividends (increased revenue), and also that the money wasn’t simply set on fire with nothing to show for it and no way to repay it.
Yes, of course. But when safe investments (e.g., Treasuries) are paying out close to zero, investors are going to tolerate lower returns than they do when Treasuries are paying out 3% or more.
It's basic arithmetic: you take the guaranteed rate, add a risk premium, and that's what investors expect from riskier investments. This is well-covered in the class I recommended.
Also, not every investor thinks in terms of consistent return. A pensioner may have a need for a guaranteed 3% annual return to keep pace with inflation. A VC, on the other hand, is often content to have zero returns for years followed by a 100x payout through an IPO.
People here don't understand basic concepts like risk adjusted returns, flight to quality, Searching for yield etc...
> A VC, on the other hand, is often content to have zero returns for years followed by a 100x payout through an IPO
I know how all this works, but 100x payout is for the small initial investments, not after 10 years of operating at multi-billion-dollar scales.
Small amounts of money are set on fire all of the time, chasing this kind of high-risk return.
Nonetheless, there's an expectation of a return, even if only in aggregate across many small startups.
What I was observing (from the outside, at a distance) was that Twitter was still being run by a startup despite being in an effectively monopoly position already and a "mature" company. Similarly, Amazon could set money on fire while they were the growing underdog. If they doubled their headcount today without doubling either revenue or profits, the idiots responsible for that would be summarily fired.
I get that Silicon Valley and their startup culture does a few things in an unusual way, but that doesn't make US dollars not be US dollars and magically turn into monopoly money that rains from the sky just because interest rates are low.
Yeah, try dealing with many frontends with mixed HTTP and HTTPS, it's a nightmare and won't always work. Additionally, you want security on content delivery for revenue protection reasons. The way you've massively over simplified the BSD work shows that you perhaps didn't understand what they did and why hardware offload is a good thing?
Subtitles are also complicated because you have to deal with different media player frameworks on the +40 different players you deal with. Getting those players, which you may not own, to recognise multiple sub tracks can be a PITA.
Things look simple to a junior developer, but those experience in building streaming platforms at scale know there are dragons when you get into the implementation. Sometimes developers and architects do over complicate things, but smart leaders avoid writing code, so its an assumption to say things are being made over complicated.
> you perhaps didn't understand what they did
I read and understood their entire technical whitepaper. I get the what, I'm just saying that the why might not make as much sense as you might assume.
> +40 different players you deal with
They own the clients. They wrote the apps themselves. This is Netflix code reading data from Netflix servers. Even if there are third-party clients (wat!?), that doesn't explain why none of Netflix's home-grown clients support more than 5 subtitle languages.
> Getting those players, which you may not own, to recognise multiple sub tracks can be a PITA.
This is a core part of the service, which everyone else has figured out. Apple TV for example has dozens of subtitle languages.[1]
With all due respect: Read what you just wrote. You're saying that an organisation that has the engineering prowess to stream at 200 Gbps per edge box and also handles terabytes of diagnostic log ingestion per hour can't somehow engineer the distribution of 40 KB text files!?
I can't even begin to outline the myriad ways in which these excuses are patent nonsense.
These are children playing with the fun toys, totally ignoring like... 1/3rd of the viewing experience. As far as the users are concerned, there's nothing else of consequence other than the video, audio, and text that they see on the screen.
"Nah, don't worry about the last one, that only affects non-English speakers or the deaf, we only care about DEI for internal hires, not customers."
[1] Just to clarify: I'm asking for there to be an option to select one language at a time from all available languages, not showing multiple languages at once, which is a tiny bit harder. But... apparently not that hard, because I have two different free, open-source video players on my PC that can do this so I can have my spouse get "full" subtitles in a foreign language while I see the "auto" English subtitles pop up in a different colour when appropriate. With Netflix I have to keep toggling between her language and my language every time some foreign non-English thing is said. Netflix is worth $362B, apparently, but hasn't figured out something put together by poor Eastern European hobbyists in their spare time.
See, you're confused because you think that the media player is owned by Netflix.
The browser gives you a certain level of control on computers, although you have to deal with the oddities of Safari, but when you go to smart TVs it's the wild west. Netflix does provide their tested framework to TV vendors but it's still not easy, because media playback often requires hardware acceleration, but the rendering framework isn't standard.
Developing for set-top boxes, multiple generations of phones, and smart TVs comes with all sorts of oddities. You think it's easy because you haven't done it.
> they want to be associated with "Faang" and yet their product is not really technology based.
You lost me. Netflix built a massive CDN, a recommendation engine, did dynamic transcoding of video, and a bunch of other things, at scale, quite some years before everyone else. They may have enshittified in the last five years, but I dont see any reason why they dont have a genuinely legitimate claim to being a founder member of the FAANG club.
I have a much harder time believing that companies with AI in their name or domain are doing any kind of AI, by contrast.
Same thing was done a little later by HBO, Disney and a plethora of others, which points to the task not really being uber-difficult.
Speaking as a consumer, Netflix’s solution is objectively better than it’s competitors. It handles network blips better, it’s more responsive to use, it has far fewer random bugs you need to work around.
You can argue whether or not that edge translates into more revenue, but the edge is objectively there.
Agree. Hulu, HBO/Max, and Disney Plus each do most of these:
- frequently decide that episodes I've watched are either completely unwatched (with random fully watched eps of the show mixed in).
- seemingly every time I leave at the start of the end-credits, I surely must have intended to come back and watch them.
- rebuild the entire interface (progressively, slowly) when I've left the tab unfocussed for too long. Instead of letting continue where I was, they show it for less than a second, the rebuild the world.
- keep resetting the closed-caption setting to "none", regardless of choosing "always" or "on instant replay"; worse, they sometimes still have the correct setting in the interface, but have disabled captions anyway.
Netflix has only once since they started streaming forgotten playback position or episode completion. They politely suggest when to reload the page (via a tiny footer banner), but even that might not appear for months. They usually know where end-credits really start, and count that as completion. They don't seem to mess with captions.
Hard to speak "objectively" as a consumer who has their own regional biases and knows none of the sausage underneath.
Maybe you're in a rural area and Netflix scaled gracefully. Maybe you're deep in SF and Netflix simply outspent to give minimal disruption to a population hub. These could both be true but don't speak to what performs better overall.
Based on my experience as a client, they’re not as reliable, Disney in particular. I thought it was a solved problem but it’s not apparently.
Pornhub has better, more reliable technology than Netflix, yet you don't see their tech blog very often do you?
PH has to stream for 20 seconds on average while boxing match is a lot longer :-)
Sounds like a consumer issue ;)
fair.
I have always wondered how do they deliver their content and what goes on behind the scenes and nobody on tech twitter or even youtubers talk about pornhub's infra for some reason. A lot of the innovation in tech has roots in people wanting to see high quality tiddies on the internet.
I worked on PH infra for a few years -
* The frontend itself runs on bare metal
* A lot of the backend was built out as microservices, running on top of Mesos and then later K8s
* CDN was in-house. The long tail is surprisingly long, but the front page videos were cached globally
The unifying theme behind PH is that they host everything on their own equipment, no AWS/GCP involved.
They want to be associated with FAANG? What does the N stand for?
Forced assoc.
[dead]
It's not guaranteed, but much fewer points of failure.
> It's not guaranteed, but much fewer points of failure.
Can you explain where this is relevant to buffering issues?
Also, you are very wrong regarding failure modes. The larger the service, the more failure modes it has. Moreover, in monoliths if a failure mode can take down/degrade the whole service, all other features are taken down/degraded. Is having a single failure mode that brings down the whole service what you call fewer points of failure?
I can't, since I don't know Netflix's architecture - I was responding to "I'd love to hear exactly how a monolith application is guaranteed to perform better, with details."
> I was responding to "I'd love to hear exactly how a monolith application is guaranteed to perform better, with details."
I asked nothing about Netflix. My question was directed at your remark regarding monoliths vs microservices.
Now, can you answer the question?
Monoliths might be better in some cases, microservices in others. Performance can obviously differ based on implementation.
I doubt a "microservice" has anything to do with delivering the video frames. There are specific kinds of infrastructure tech that are specifically designed to serve live video to large amounts of clients. If they are in fact using a "microservice" to deliver video frames, then I'd ask them to have their heads examined. Microservices are typically used to do mundane short-lived tasks, not deliver video.
There’s very likely a dedicated service for delivering frames.
That’s service would technically be a “microservice” even if it is a large service.
Why is that “very likely”?
I’m genuinely curious about the reasoning behind that statement. It’s very possible that you are using a different set of assumptions or definitions than I am.
I say that because, for performance reasons, you’d never want to wait on potentially several hops to stream media and because the act of streaming could very well be a good enough domain boundary.
I'm fairly well versed in basic ip networks, but the above sounds like a wordsalad to me.
That’s okay, you probably just haven’t worked with high performance services or micro services before.
Network requests (sometimes called hops) take a significant amount of time. You don’t want your streaming service to take a significant amounts of time.
In microservices land, you generally try making services based on some “domain” (metaphorical, not like a literal domain name) which defines the responsibility of any given service. Defining these domains is more art than science and depends on the business needs and each team.
Video streaming might be one of those domains for Netflix.
Honestly, I've seen my fair share of 7 layer SOAP stacks. Not sure if any of your unsubstatiated handwaving is making any sense.
Tell me specifics.
The only time I worked on a project that had a live television launch, it absolutely tipped over within like 2 minutes, and people on HN and Reddit were making fun of it. And I know how hard everyone worked, and how competent they were, so I sympathize with the people in these cases. While the internet was teeing off with easy jokes, engineers were swarming on a problem that was just not resolving, PMs were pacing up and down the hallway, people were getting yelled at by leadership, etc. It's like taking all the stress and complexity of a product launch and multiplying it by 100. And the thing I'm talking about was just a website, not even a live video stream.
Some breaks are just too difficult to predict. For example, I work in ecommerce and we had a page break because the content team pushed too many items into an array, that caused a back-end service to throw errors. Because we were the middle-service, taking from the CMS and making the request to back-end, not sure how we could have seen that issue coming in advance (and no one knew there was a limit).
> Some breaks are just too difficult to predict.
Absolutely. I think a great filter for developers is determining how well they understand this. Over-simplification of problems and certainty about one’s ability to build reliable services at scale is a massive red flag to me.
I have to say some of the hardest challenges I’ve encountered were in e-commerce, too.
It’s a lot harder and more interesting than I think many people realize. I learned so much working on those projects.
In one case, the system relied on SQLite and god damn did things go sideways as the company grew its customer base. That was the fastest database migration project I’ve ever been on, haha.
I often think it could have worked today. SQLite has made huge leaps in the areas we were struggling. I’m not sure it would have been a forever solution (the company is massive now), but it would have bought us some much-needed time. It’s funny how that stuff changes. A lot of my takeaways about SQLite 10 years ago don’t apply quite the same anymore. I use it for things now that I never would have back then.
I'm not saying it's easy, but start by assuming that there's a limit and that any request can throw errors? (Proceed accordingly .)
All requests expect errors. How a developer handles them... well...
And for limit checking, how often do you write array limit handlers? And if the BE contract doesn't specify? Additionally, it will need as a regression unit test, because who knows when the next developer will remove that limit check.
Those are the times when you identify who is there to help and who is there to be performative.
Those performative people are worse than useless. They take up critical bandwidth and add no real value.
An effective operational culture has methods for removing those people from the conversations that matter. Unfortunately that earns you a reputation for being “cutthroat” or “lacking empathy.”
Both of those are real things, but it’s the C players who claim they are being unfairly treated, when in fact their limelight-seeking behavior is the problem.
If all that sounds harsh, like the kitchen on The Bear, well…that’s kinda how it is sometimes. Not everyone thrives in that environment, and arguably the ones who do are a little “off.”
what was the ultimate cause/fix of issues in your case? a database thing?
Insufficient testing
While that may be the case, the things like this I've experienced have been more along the lines of incompetent management.
In one case I was doing an upgrade on an IPTV distribution network for a cable provider (15+ years ago at this point). This particular segment of subscribers totalled more than 100k accounts. I did validation of the hardware and software rev installed on the routers in question prior to my trip to the data center (2+ hour drive). I informed management that the currently running version on the router wasn't compatible with this hardware rev of card I was upgrading to. I was told that it would in fact work, that we had that same combination of hw/sw running elsewhere. I couldn't find it when I went to go look at other sites. I mentioned it in email prior to leaving I was told to go.
Long story short, the card didn't work, had to back it out. The HA failover didn't work on the downgrade and took down all of those subscribers as the total outage caused a cascading issue with some other gear in this facility. All in all it was during off-peak time of day, but it was a waste of time and customer sat.
You cannot leave us hanging like that. What was the issue?
Shoulda used Varnish.
> people were getting yelled at by leadership
this is where you get up and leave
You are basically saying, everybody who criticizes Netflix now has no clue.
That’s a bold claim given that people with inside knowledge could post here without disclosing they are insiders.
Is that some kind of No True Scotsman?
I’m just pointing out that there are Netflix engineers reading all these words.
For every thread like this, there are likely people who are readers but cannot be writers, even though they know a lot. That means the active posters exclude that group, by definition.
These threads often have interesting and insightful comments, so that’s cool.
> You are basically saying, everybody who criticizes Netflix now has no clue.
GP clearly meant some people not everybody. You are the one making bold claims.
At the scale that Netflix just dealt with? Yeah I honestly think this is a case where less than 5000 people in the world are really qualified to comment.
Not clear what scale they were attempting, but yes delivering a live stream to 10m+ users on the public internet with a reasonable end to end latency (under 30 seconds glass to viewer) is not a trivial problem, and it’s not something Netflix do a lot.
It’s a very different problem to distributing video on demand which is Netflix’s core business.
3) the people supplying 1) and 2) with tools (hard- or software)
We (yep) don't know the exact details, but we do get sent snapshots of full configs and deployments to debug things... we might not see exact load patterns, but it's enough to know. And if course we can't tell due to NDAs.
you are so right about that. tho I'm sure that many of the netflix folks are still doing their after action analysis in prep for Dec 25 NFL.
now take this realization and apply it to any news article or forum post you read and think about how uninformed they actually are.
If NFL decides to keep Netflix for that, that is. The bandwidth for that fight was rookie numbers, and after that fiasco, why would the NFL not break their contract and choose someone with a proven track record doing bigger live events, like the World Cup?
Because Netflix pays them either way, I would imagine. Breaking a contract on a sure thing to the tune of tens (hundreds?) of millions of dollars for a maybe is a large business risk.
Reputational damage is going to be far more Netflix than the NFL if they totally club it.
That and this fight is going to likely be an order of magnitude more viewers than the Christmas NFL games if the media estimates on viewership were remotely accurate. You’re talking Super Bowl type numbers vs a regular season NFL game. The problems start happening at the margin of capacity most of the time.
But "reputational damage" doesn't affect profits. Nobody is canceling Netflix because they had issues watching the fight, just like nobody will cancel if the NFL experience sucks on Netflix. They will bitch and moan on Twitter, but it's essentially just talk.
I'm sure 2) can post. But it won't be popular, so you'll need to dig to find it.
Most people are consumers and at the end of the day, their ability to consume a (boring) match was disrupted. If this was PPV (I don't think it is) the paid extra to not get the quality of product they expected. I'm not surprised they dominate the conversation.
I am 2, I absolutely will get argued with by people who think they know better.
I'm also not going to criticise my peers because they could recognise me and I might want to work with them one day.
And nonetheless, it freezes up.
You don’t belong to either group. What does this make you?
You may have belonged to one of those groups in the past, or maybe you will someday. I certainly have. Many of the more seasoned folks on HN have.
Stuff goes wrong, random internet people jump on the opportunity to speculate and say wildly off-the-mark comments, and the engineers trying to keep the ship from sinking have to sit quietly for fear of making the PR backlash worse.
Most or all of your replies are to people who hallucinate things you didn’t say. Your patience is inspiring.
I was interviewing a dev candidate some years ago and they were totally lost trying to traverse a tree on the whiteboard. I kept helping them get unblocked, because my philosophy is that anyone can get stuck once, but if I’m supposed decide whether to hire you, I should get the most/best data I can.
Another person was observing the interview, for training purposes, and afterwards said to me: “Do you have kids? You have so much patience!”
If I weren’t retired, I would totally apply to work for you
[dead]
> who offer insights based on how stuff works at a small scale, or better yet, pronouncements rooted in “first principles.”
And looking through the comments, this is just wrong.
[flagged]
For an event like this, there already exists an architecture that can handle boundless scale: torrents.
If you code it to utilize high-bandwidth users upload, the service becomes more available as more users are watching -- not less available.
It becomes less expensive with scale, more available, more stable.
The be more specific, if you encode the video in blocks with each new block hash being broadcast across the network, just managing the overhead of the block order, it should be pretty easy to stream video with boundless scale using a DHT.
Could even give high-bandwidth users a credit based upon how much bandwidth they share.
With a network like what Netflix already has, the seed-boxes would guarantee stability. There would be very little delay for realtime streams, I'd imagine 5 seconds top. This sort of architecture would handle planet-scale streams for breakfast on top of the already existing mechanism.
But then again, I don't get paid $500k+ at a large corp to serve planet scale content, so what do I know.
The protocol for a torrent is that random parts of a file get seeded to random people requesting a file, and that the clients which act as seeds are able to store arbitrary amounts of data to then forward to other clients in the swarm. Do the properties about scaling still hold when it's a bunch of people all requesting real time data which has to be in-order? Do the distributed Rokus, Apple TVs, Fire TVs and other smart TVs all have the headroom in compute and storage to be able to simultaneously decode video and keep old video data in RAM and manage network connections with upload to other TVs in their swarm - and will uploading data to other TVs in the swarm not negatively impact their own download speeds?
Yes, the properties about scaling do hold even with near-real-time streams. [1]
The problems with using it as part of a distributed service have more to do with asymmetric connections: using all of the limited upload bandwidth causes downloads to slow. Along with firewalls.
But the biggest issue: privacy. If I'm part of the swarm, maybe that means I'm watching it?
[1]: Chainsaw: P2P streaming without trees, https://link.springer.com/chapter/10.1007/11558989_12
Use your imagination for just a moment.
The torrent is an example of the system I am describing, not the same system. Torrents cannot work for live streams because the entire content is not hashable yet, so already you have to rethink how it's done. I am talking about adding a p2p layer on top of the existing streaming protocol.
The current streaming model would prioritize broadcasting to high-bandwidth users first. There should be millions of those in a world-scale stream.
Even a fraction of these millions would be enough to reduce Netflix's streaming costs by an order of magnitude. But maybe Netflix isn't interested in saving billions?
With more viewers, the availability of content increases, which reduces load on the centralized servers. This is the property of the system I am talking about, so think backwards from that.
With a livestream, you want the youngest block to take priority. You would use the DHT to manage clients and to manage stale blocks for users catching up.
The youngest block would be broadcast on the p2p network and anyone who is "live" would be prioritizing access to that block.
Torrent clients as they are now handle this case, in reverse; they can prioritize blocks closer the current timestamp to created an uninterrupted stream.
The system I am talking about would likely function at any scale, which is an improvement from Netflix's system, which we know will fail -- because it did.
Torrents are awful for live events.
1. Everyone only cares about the most recent "block". By the time a "user" has fully downloaded a block from Netflix's seedbox, the block is stale, so why would any other user choose to download from a peer rather from netflix directly?
2. If all the users would prefer to download from netflix directly rather than a p2p user, then you already have a somewhat centralized solution, and you gain nothing from torrents.
1. Because Netflix is at capacity? Or because the peer is closer and faster than the original?
If Netflix is at capacity and you have to wait for a peer, then you have simply reinvented the buffering problem. In other words
1. I exclusively download from a peer and my stream is measurably behind
2. I switch to a peer when Netflix is at capacity and then I have to wait for the peer to download from Netflix, and then for me to download from the peer. This will cause the same buffering issue that Netflix is currently being lambasted for.
This solution doesn’t solve the problem Netflix has
"Buffering problem" can have very different QOL manifestations, so :
1. You still get a better viewing experience without interruptions. Besides, your "measurably behind" can be an imperceptible fraction of a second?
2. Similar thing - shorter queues - the switch can happen faster due to the extra capacity
So yes, it does solve the practical problem, though not the theoretical one
If Netflix were working correctly and could handle the load, you'd absolutely be correct.
But it does seem the capacity of a hybrid system of Netflix servers plus P2P would be strictly greater than either alone? It's not an XOR.
And note that in this case of "live" streaming, it still has a few seconds of buffer, which gives a bandwidth-delay product of a few MB. That's plenty to have non-stale blocks and do torrent-style sharing.
If switching to a peer causes increased buffering (which it will, because you still have to wait for the peer to download from Netflix) then you will still have the original problem Netflix is suffering from.
If the solution to users complaining about buffering is to build a system with more inherent buffering then you are back at square one.
I think it’s might be helpful to look at netlfix’s current system as already a distributed video delivery system in which they control the best seeds. Adding more seeds may help, but if Netflix is underprovisioned from the start you will have users who cannot access the streams
Yes, and then some idiot with an axe to grind against Logan Paul starts DDoSing people in the Netflix swarm, kicking them out of the livestream. This is always a problem because torrents, by design, are privacy-hostile. That's how the MAFIAA[1] figured out you were torrenting movies in 2004 and how they sent your ISP a takedown notice.
Hell, in the US, this setup might actually be illegal because of the VPPA[0]. The only reason why it's not illegal for the MAFIAA to catch you torrenting is because of a fun legal principle where criminals are not allowed to avail themselves of the law to protect their crimes. (i.e. you can't sue over a drug deal gone wrong)
[0] Video Privacy Protection Act, a privacy law passed which makes it illegal to ask video providers for a list of who watched what, specifically because a reporter went on a fishing expedition with video data.
[1] Music and Film Industry Association of America, a hypothetical merger of the MPAA and RIAA from a 2000s era satire article
Tyson was fighting Jake Paul, not Logan Paul. That’s Jake’s brother.
I don’t pay my ISP each month to be part of a streaming sites infrastructure. I pay the streaming site each month to use theirs.
If you use Comcast's modem/wifi router, you are part of their service infrastructure. Xfinity WiFi Home Hotspot
Yes, it's on by default, but you can turn this off if you want to. https://www.xfinity.com/support/articles/disable-xfinity-wif...
And you'll pay less if you become a part
Sure. If there’s anything publicly traded companies are known for, it’s passing savings onto their customers instead of their shareholders.
The knowledge about that universal practice can be easily acquired.
Then, instead of people complaining about buffering issues, you'd get people complaining about how the greedy capitalists at Netflix made poor Joe Shmoe use all of his data cap, because they made him upload lots of data to other users and couldn't be bothered to do it themselves.
The way to deal with this is to constantly do live events, and actually build organizational muscle. Not these massive one off events in an area the tech team has no experience in.
I have this argument a lot in tech.
We should always be doing (the thing we want to do)
Somme examples that always get me in trouble (or at least big heated conversations)
1. Always be building: It does not matter if code was not changed, or there has been no PRs or whatever, build it. Something in your org or infra has likely changed. My argument is "I would rather have a build failure on software that is already released, than software I need to release".
2. Always be releasing: As before it does not matter if nothing changed, push out a release. Stress the system and make it go through the motions. I can't tell you how many times I have seen things fail to deploy simply because they have not attempted to do so in some long period of time.
There are more just don't have time to go into them. The point is if "you did it, and need to do it again ever in the future, then you need to continuously do it"
Doing dry runs regularly makes sense, but whether actually shipping it makes sense seems context-dependent. It depends on how much you can minimize the side effects of shipping a release.
Consider publishing a new version of a library: you'd be bumping the version number all the time and invalidating caches, causing downstream rebuilds, for little reason. Or if clients are lazy about updating, any two clients would be unlikely to have the same version.
Or consider the case when shipping results in a software update: millions of customer client boxes wasting bandwidth downloading new releases and restarting for no reason.
Even for a web app, you are probably invalidating caches, resulting in slow page loads.
With enough work, you could probably minimize these side effects, so that releasing a new version that doesn't actually change anything is a non-event. But if you don't invalidate the caches, you're not really doing a full rebuild.
So it seems like there's a tension between doing more end-to-end testing and performance? Implementing a bunch of cache levels and then not using it seems counterproductive.
These are the very arguments I get all the time.
1) I want to invalidate caches, I want to know that these systems work. I want to know that my software properly handles this situation.
2) if I have lazy clients. I want to know. And I want to motivate them on updating sooner or figure out how to force update them. I don’t want to not update because some people are slow. I want the norm to be it is updating, so when there is a reason to update, like a zero day, I can have some notion that the updates will work and the lazy clients will not be an issue.
I am not talking about fake or dry runs that go through some portion of motions, I want every aspect of the process to be real.
Performance means nothing if your stuff is down. And any perceived performance gained by not doing proper hygiene is just tweaking the numbers to look better than they really are.
I think it often makes sense to do full releases frequently, but not continuously. For example, Chrome is on an approximately four week schedule, which makes sense for them. Other projects have faster cadences. There is a point of diminishing returns, though, and you seem to be ignoring the downsides.
I think once a week is good. Maybe once every two weeks.
It's very hard to do a representative dry run when the most likely potential points of failure are highly load-dependent.
You can try and predict everything that'll happen in production, but if you have nothing to extrapolate from, e.g. because this is your very first large live event, the chances of getting that right are almost zero.
And you can't easily import that knowledge either, because your system might have very different points of failure than the ones external experts might be used to.
They could have done a dry run. They could have spun up a million virtual machines somewhere, and tested their video delivery for 30 minutes. Even my small team spins up 10,000 EC2 instances on the regular. Netflix has the money to do much more. I'm sure there are a dozen ways they could have stress-tested this beforehand. It's not like someone sprang this on them last week and they had to scramble to put together a system to do it.
How representative is an EC2 instance in a datacenter simulating user behavior really, though?
These would likely have completely different network connectivity and usage patterns, especially if they don't have historical data distributions to draw from because this was their first big live event.
>How representative is an EC2 instance in a datacenter simulating user behavior really, though?
Systemic issues causing widespread buffering isn't "user behavior". It's a problem with how Netflix is trying to distribute video. Sure some connections aren't up to the task, and that isn't something Netflix can really control unless they are looking to improve how their player falls-back to lower bitrate video, which could also be tested.
>because this was their first big live event.
That's the point of testing. They should have already had a "big live event" that nobody paid for during automated testing. Instead they seem to have trusted that their very smart and very highly paid developers wouldn't embarrass them based on nothing more than expectations, but they failed. They could have done more rigorous "live" testing before rolling this out to the public.
1) You don't know if they did or did not do this kind of testing. I don't see any proof either way here. You're assuming they didn't.
2) You're assuming whatever issue happened would have been caught by testing on generic EC2 instances in AWS. In the end these streams were going to users on tons of different platforms in lots of different network environments, most of which look nothing like an EC2 instance. Maybe there was something weird with their networking stack on TCL Roku TVs that ended up making network connections reset rapidly chewing up a lot of network resources which led to other issues. What's the EC2 instance type API name for a 55" TCL Roku TV from six years ago on a congested 2.4GHz Wireless N link?
I don't know what happened in their errors. I do know I don't have enough information to say what tests they did or did not run.
I manage ~3000 customized websites based on the same template code. Sometimes we make changes to the template code that could affect the customizations - it is practically impossible to predict what might cause a problem due to the nature of the customizations. We'll take before and after screenshots of every page on every site, so it can get into the 100s of thousands of screenshots. We'll then run a diff on the screenshots to see what changed, reviewing the screenshots with the most significant changes. Then we'll address the problems we find and deploy the fixed release.
When we do these large screenshot operations, the EC2 instances are running for maybe 15 or 20 minutes total. It's not exactly cheap, but losing clients because we broke their site is something we want to avoid. The sites are hosted on a 3rd party service, and we're rate-limited by IP address, so to get this done in a reasonable amount of time we need to spin up 10,000 EC2 instances to distribute the work. We have our own software to manage the EC2 instances. It's honestly pretty simple, but effective.
Maybe they did. We don't know they did not. But problem is that real world traffic will still be always totally different, varied and dynamic in many unexpected ways and a certain link might be under certain effect causing a ripple effect.
I like all of these considerations, although I also imagine for every context there is some frequency at which it is worthwhile to invalidate the caches to ensure that all parts of the system are still functioning as expected (including the rebuilding of the caches).
This.
I can’t tell you the I bet of times things worked because the cache was hot. And a restart or cache invalidation would actually cause an outage.
Caches must be invalidated at a regular interval. Any system that does not do this is heading for some bad days.
What I’m seeing in large organizations is tracking dependencies within a team’s scope is better than dependencies between teams, because so many developers between teams are punting on tracking dependencies upon other teams’ artifacts if there isn’t a formal system already in place at the organization to establish contracts between teams along these dependency routes, that automatically handle the state and notifications when changes announcing the intended state change are put through the system. Usually some haphazard dependency representation is embedded into some software and developers call it a day, expecting the software to auto-magically solve a social-technical logistical problem instead of realizing state transitions of the dependencies are not represented and the software could never deliver what they assume.
This is great, but what possible counterargument is there? We should prolong indefinitely a spooky ambiguity about whether the system works or not?
Easy: Short term risk versus long term risk. If I deploy with minimal changes today, I'm taking a non-zero short-term risk for zero short-term gain.
While I too am generally a long-term sort of engineer, it's important to understand that this is a valid argument on its own terms, so you don't try to counter it with just "piffle, that's stupid". It's not stupid. It can be shortsighted, it leads to a slippery slope where every day you make that decision it is harder to release next time, and there's a lot of corpses at the bottom of that slope, but it isn't stupid. Sometimes it is even correct, for instance, if the system's getting deprecated away anyhow why take any risk?
And there is some opportunity cost, too. No matter how slick the release, it isn't ever free. Even if it's all 100% automated it's still going to barf sometimes and require attention that not making a new release would not have. You could be doing something else with that time.
In some environments, deploying to production has a massive bureaucracy tax. Paperwork, approvals, limited windows in time, can’t do them during normal business hours, etc.
Those taxes were often imposed because of past engineering errors. For example, Don't deploy during business hours because a past deployment took down production for a day.
A great engineering team will identify a tax they dislike and work to remove it. Using the same example, that means improving the success rate of deployments so you have the data (the success record) to take to leadership to change the policy and remove the tax.
Finite compute, people, and opportunity cost.
It is just a reframing of build vs maintain.
The counterargument is obvious for anyone who has been on call or otherwise responsible for system stability. It's very easy to become risk-averse in any realm.
Doesn't ensuring stuff actually works tangibly lower risk?
Yes. Because it lowers the chance compound risk. The longer you go without stressing the system the more likely you will have a double failure, thus increasing your outage duration.
Simply put. You don’t want to delay funding out something is broke, you want to know the second it is broken.
The the case I am suggesting, a failed release will be often deploying the same functionality, thus many failure modes will result in zero outage. It all failure modes will result in an outage.
When the software is expected to behave differently after the deployment, more systems can result in being part of the outage. Such as the new systems can’t do something or the old systems can’t do something.
Not exactly, but it's worth the experiment in trying things anyway. Say you currently have a release once every few months, an ambitious goal would be to get to weekly releases. Continuous enough by comparison. But 'lower risk' is probably not the leading argument for the change, especially if the quarterly cycle has worked well enough, and the transition itself increases risk for a while. In order for a committed attempt to not devolve into a total dumpster fire, various other practices will need to be added, removed, or changed. (For example, devs might learn the concept of feature flags.) The individuals, which include management, might not be able to pull it off.
The common and flawed counterargument is “when we deploy, outages happen.” You’ll hear this constantly at companies with bad habits.
Deploying is expensive for some models. That could involve customer facing written release notes, etc. Sometimes the software has to be certified by a govt authority.
Additionally, refactor circle jerks are terrible for back-porting subsequent bug fixes that need to be cherry picked to stable branches.
A lot of of the world isn’t CD and constant releases are super expensive.
There's two other ways I've seen it phrased:
"Test what you fly, and fly what you test" (Supposedly from aviation)
"There should be one joint, and it should be greased regularly" (Referring to cryptosystems I think, but it's the same principle. Things like TLS will ossify if they aren't exercised. QUIC has provisions to prevent this.)
> 1. Always be building: It does not matter if code was not changed...
> 2. Always be releasing...
A good argument for this is security. Whatever libraries/dependencies you have, unpin the versions, and have good unit tests. Security vulnerabilities that are getting fixed upstream must be released. You cannot fix and remove those vulnerabilities unless you are doing regular releases. This in turn also implies having good unit tests, so you can do these builds and releases with a lower probability of releasing something broken. It also implies strong monitoring and metrics, so you can be the first to know when something breaks.
> Whatever libraries/dependencies you have, unpin the versions, and have good unit tests.
Nitpick: unit tests by definition should not be exercising dependencies outside the unit boundary. What you want are solid integration and system tests for that.
Unless the upstream dependency happens to maintain stable branches, constantly pulling in the latest branches increases your risk of vulnerabilities more than getting the discovered bug patches
There should be a caveat that such this kind of decision should be based on experience and not treated as a rule that juniors might blindly follow. We all know how "fail fast and early" turned out (or whatever the exact phrase was).
This is golden advice, honestly. "If you don't use it, you lose it" applied to software development.
They've been doing live events since 2023. But it's hard to be prepared for something that's never been done by anyone before — a superbowl scale event, entirely viewed over the internet. The superbowl gets to offload to cable and over the air. Interestingly, I didn't have any problems with my stream. So it sounds like the bandwidth problems might be localized, perhaps by data center or ISP.
Maybe they considered this event as a rehearsal for the upcoming NFL streams which I am guessing might have a wider audience
Yes I agree that fight had a great deal of interest but the nfl is their real goal.
[dead]
I suspect a lot of it could be related to ISP bandwidth. I streamed it on my phone without issue. Another friend put their TV on their phone’s WiFi which also worked. Could be partly that phone hotspots lower video bandwidth by default.
I suspect it’s a bit of both Netflix issues and ISPs over subscribing bandwidth.
My suspicion is the same as yours, that this may have been caused by local ISPs being overwhelmed, but it could be a million other things too. I had network issues. I live in a heavily populated suburban area. I have family who live 1000+ miles away in a slightly less populated suburban area, they had no issues at all.
I would guess the majority of the streamed bandwidth was sourced from boxes like these in ISP's points of presences around the globe: https://openconnect.netflix.com/en/
So I agree the problems could have been localized to unique (region, ISP) combinations.
The ISP hypothesis doesn't make sense to me. I could not stream the live event from Netflix. But I could watch any other show on netflix or youtube or hulu at the same time.
Some ISPs have on-site Netflix Open Connect racks. The advantage of this is that they get a high-priority quality of service data stream into the rack, which then serves the cached content to the ISP customers. If your ISP doesn't have a big enough Netflix rack and it gets saturated, then you're getting your streams at the whim of congestion on the open internet. A live stream is a few seconds of video downloaded, and it has to make it over the congestion of the internet in a few seconds and then repeat. If a single one of these repeats hits congestion and gets delayed, you see the buffering spinning wheel. Other shows, on the other hand, can show the cached Netflix splash animation for 10 seconds while they request 20 minutes of cache until they get it. So, dropped packets don't matter much. Even if the internet is seeing congestion every couple of minutes, delaying your packets, it won't matter as non-live content is very flexible and patient about when it receives the next 20-minute chunk. I'm not an ISP or Netflix engineer, so don't take these as exact numbers. I'm just explaining how the "bandwidth problems might be localized" hypothesis can make sense from my general understanding.
Yeah, I think people are incorrectly assuming that everyone had the same experience with the stream. I watched the whole thing and only had a few instances of buffering and quality degradation. Not more than 30 seconds total during the stream.
Even if it was only 30% of people had a problem that's still millions of unhappy users. Not great for a time sensitive event.
Also, from lurking in various threads on the topic Netflix's in app messages added to people's irritation by suggesting that they check their WiFi/internet was working. Presumably that's the default error message but perhaps that could have been adjusted in advance somehow.
The point is that if the problem was different depending on the user, it will be in a distribution layer, not in the encoding or production layer
That eliminates a whole raft of problems.
One of the times I reloaded the page I got a raw envoy error message!
I had issues here and there but there was workarounds. Then, towards the end, the quality either auto negotiated or was forced down to accommodate the massive pull.
Agreed. This is a management failure, full stop. Unbelievable that they'd expect engineering to handle a single Livestream event of this magnitude.
> ...the tech team has no experience in
Unless Netflix eng decides to release a public postmorterm, we can only speculate. In my time organizing small-time live streams, we always had up to 3 parallel "backup" streams (Vimeo, Cloudflare, Livestream). At Netflix's scale, I doubt they could simply summon any of these providers in, but I guess Akamai / Cloudflare would have been up for it.
The WWE is moving their programming to Netflix next year. If I were them, I'd be horrified at what I saw.
Sometimes this just isn't feasible for cost reasons.
A company I used to work for ran a few Super Bowl ads. The level of traffic you get during a Super Bowl ad is immense, and it all comes at you in 30 seconds, before going back to a steady-state value just as quickly. The scale pattern is like nothing else I've ever seen.
Super Bowl ads famously seven million dollars. These are things we simply can't repeat year over year, even if we believed it'd generate the same bump in recognition each time.
I think Netflix have a fair bit of organisational muscle, perhaps the fight was considered not as large of an event as the NFL streams would be in the future.
Also, "No experience in" really? You have no idea if that's really the case
that’s difficult to reproduce at scale; there are only so many “super bowl” events in a calendar year
Wow, building talent from within? I thought that went out of fashion. I think companies are too impatient to develop their employees.
Everyone here talking like this something unique netflix had to deal with. Hotstar live streamed india va Pakistan cricket match with zero issues with all time high live viewership ever in the history of live telecast. Why would viewers paying $20 month want to think about their technical issues, they dropped the ball pure and simple. Tech already exists for this, it’s been done before even by espn, nothing new here.
The Independent reports 35m viewers of that cricket match [0].
Rolling Stone reported 120m for Tyson and Paul on Netflix [1].
These are very different numbers. 120m is Super Bowl territory. Could Hotstar handle 3-4 of those cricket matches at the same time without issue?
[0] https://www.the-independent.com/sport/cricket/india-pakistan...
[1] https://www.rollingstone.com/culture/culture-news/jake-paul-...
Too late for me to edit, but Netflix is now reporting 60m, half of what Rolling Stone said.
https://x.com/netflix/status/1857906492235723244?s=46
India - Australia is the one of interest, scored cricket’s highest concurrent audience ever, 59 Million.
https://www.icc-cricket.com/news/biggest-cricket-world-cup-e...
Majority of superbowl viewers watch it on cable. Streaming gets fewer than 10M concurrents
Do people even have cable TV anymore? I have internet from my "cable" company but I don't have the "cable" connected to anything but the modem. Everything I watch is streamed. The only thing connected to my TV is a Roku.
Do people even have cable TV anymore?
Six seconds on the Google shows 58 million households in the United States. So, roughly 145,000,000 people.
You make the tech bubble mistake of believing that high speed internet is as ubiquitous as coax.
>Six seconds on the Google
I see 68.7 million people, not households. There's my 6 seconds.
Maybe 10 minutes would give me a better truth.
>You make the tech bubble mistake of believing that high speed internet is as ubiquitous as coax.
Yes, and no. Given that the top US cities contain about 8% of the population, you can cover a surprising amount of large country with a surprisingly small amount of area coverage. So it's not as straightforward as "people in SF are in a bubble".
I assume you know the answer to your questions is: of course they do. However, I’m in the same boat as you. The joke’s on us, I guess.
I guess my question is: why? "Cable boxes" are uniformly awful to use in my experience. The UI is clunky, they take up space and it's another remote and another tangle of wires to try to hide. What advantage do they offer in 2024?
https://www.cabletv.com/blog/why-people-still-pay-for-cable says it’s inertia, live sports and ease of use.
Don’t act so surprised—-streaming is a pain in the ass to figure out. People have been trained to tolerate a 3-second UI lag for every button press (seemingly all cable boxes are godawfully shitty like this—-it must be the server-side UI rendering design?)
BUT! You can record your game and the cable TV DVR is dead reliable and with high quality. There is no fear of competing for Wi-Fi bandwidth with your apartment or driveway neighbors, and the DVR still works even if cable is out. And as long as you haven’t deleted the recording it won’t go away for some stupid f’ing reason.
Finally, the cable TV DVR will let you fast forward through commercials—-or you can pause live TV to break for bathroom and make a snack, so you can build up a little buffer, now you are fast forwarding commercials on nearly-live TV. You can’t fast forward commercials with most mainstream streaming anymore. Who broadcasts your big games? Big players like Paramount+ won’t let you skip commercials anymore. The experience is now arguably worse. Once you settle in, forward 30sec back 30sec buttons work rather smoothly (that’s one part of cable TV boxes that has sub-half-second latency).
Your concern about extra remotes and extra boxes and hiding wires is a vanity most don’t care about. They are grateful for how compact big-screen TVs are these days compared to the CRTs or projection TVs of the past. They probably have their kids’ game console and a DVD/BluRay player on the same TV stand anyway.
Apparently movies purchased on Roku are now on Vudu. I hope that people who bought movies on Roku were able to figure it out. This is how technology sucks. Movies purchased with my cable provider’s Video On Demand are still with me, slow as shit as navigating to them is.
I last regularly used a DirecTV DVR. There were a surprising number of times where it wouldn't let me fast forward through ads. Not only that, sometimes it would connect out to the internet to download new targeted forced ads on stuff that was recorded a while ago.
The horrors! I stand corrected. :(
What advantage do they offer in 2024?
They exist in places where the internet infrastructure is not adequate for constant multiple streams.
Content is king. And there's lots of content on cable that is not on streaming. Just consider local and regional news and sports.
Many residential buildings, like the one in which I live, include cable TV with the rent. Why add more clutter and expense for streaming?
There are plenty of other reasons. Your position seems to be "stop liking things I don't like."
> What advantage do they offer in 2024?
Live news and sports is huge for a ton of people.
You have access to all the shows from the major networks. You don’t need to subscribe to Peacock and Paramount and Hulu and the TBS app and Discovery+ and…
Better yet, they’re all combined in one interface as opposed to all trying to be the only thing that you use.
Also, especially if you grew up with it, there is absolutely a simplicity in linear TV. Everyone was used to a DVR. And yeah the interface sucks, but it sucked for everyone already anyway so they’re used to it. Don’t know what you wanna watch? Turn on a channel you watch and just see what’s on. No looking at 400 things to pick between.
I’ve seen people switch off and have serious trouble because it’s such a different way of watching TV from what they were used to. They end up using something like Hulu Live or YouTube TV to try and get the experience they’re used to back.
This. I’m exactly in this YouTube tv camp and most the time just miss the simplicity of the old cable. Having to find things to watch is for me and awful experience. Then when I do want to watch something trying to figure out which app it’s actually on is awful. I think we subscribed to a dozen different things, it’s so damn fragmented. Even in early days if Netflix, I was a holdout that kept going to blockbuster because the UI of visually scanning a wall/shelf of DVDs was far superior to the Netflix version of the same IMO.
This is definitely turning into my version of an old man rant. “Back in my day…” the main benefit of it all is I actually just don’t watch as much as I once did. The friction is too high. Or, the commitment is too high-I dont usually want to jump into some 10 episode series.
Well I haven’t gone back to linear TV, I totally get it.
I don’t subscribe to anything that doesn’t work with my Apple TV. Netflix for example won’t integrate with it the way Hulu does. So whatever show I’m watching on Netflix? Wouldn’t show up in my show list on my Apple TV. I forget it exists.
So I don’t subscribe to it. Or anything else like that. You are NOT more important than me, service I pay for.
The only two exceptions are YouTube (which obviously works differently) and Plex for the few things that I already already owned on DVD or can’t get on any service.
It works well enough for me. But I still find myself missing a linear TV now and then.
I've certainly listened to some fascinating documentaries on BBC Radio 4 on subjects which it would never have occurred to me to seek out. There's definitely some advantages to linear broadcast.
I don't have TV but I watched the Euro football team matches at my mom's because guess what watching sport streams at 480p is no fun- and it frequently breaks because the internet wasn't meant for live broadcasting to a large audience.
All my stuff is combined in one interfare with Kodi. It is nice to have a TV which fully 100% respects me!
From my experience? The ability to punch in a channel number (or not even that) and get something playing, instantly, without the need to make a choice.
For many people, often those with backgrounds that make them unlikely to frequent HN, the experience they're looking for is "1. get home, 2. open beer, 3. turn TV on, 4. watch."
The default state of a streaming app is to ask you what you want to watch, and then show you exactly the thing you selected. The default state of traditional TV is to show you something, and let you switch to something else if you can't stand the thing you're watching right now or have something specific in mind. Surprisingly, many people prefer the latter over the former.
The same applies to radio versus streaming, many family members of mine don't use streaming, because all it takes to turn on the radio is turning the key in the ignition, which they have to do anyway.
Until it switches entirely to Netflix one day
Going by their failure this time around, that's not happening anytime soon.
The Games Awards claims to have 118 million livestreams [1], and went off without a hitch.
I watched it for the game trailers, actually shocked that it's also superbowl viewership territory.
https://variety.com/2023/digital/news/game-awards-2023-break...
"The festivities were streamed live from the Peacock Theater in L.A. across more than 30 platforms including YouTube, Twitch, Facebook, TikTok Live, X (Twitter), Steam, WeChat, Bilibili, Huya, DouYu, Xiaohongshu and Instagram Live"
So no one service had to take the huge brunt. Much of it was divided I guess.
But that's exactly the point: Netflix didn't do this in a vacuum, they did it within Netflix.
It might just have been easier to start from scratch, maybe using an external partner experienced in live streaming, but the chances of that decision happening in a tech-heavy company such as Netflix that seems to pride itself on being an industry leader are close to zero.
> with zero issues
depending on whom you ask, the bitrate used by the stream is significantly lower than what is considered acceptable from free livestreaming services, that albeit stream to much, much smaller audience.
without splitting hairs, livestreaming was never their forte, and going live with degradation elsewhere is not a great look for our distributed computing champ.
Netflix is good only on streaming ready made content, not live streaming, but;
1. Netflix is a 300B company, this isn't a resources issue.
2. This isn't the first time they have done live streaming at this scale either. They already have prior failure experience, you expect the 2nd time to be better, if not perfect.
3. There were plenty of time between first massive live streaming to second. Meaning plenty of time to learn and iterate.
The problem is that provisioning vast capacity for peak viewership is expensive and requires long-term commitment. Some providers won't give you more connectivity to their network unless you sign a 12 month deal where you prepay that.
Peak traffic is very expensive to run, because you're building capacity that will be empty/unsused when the event ends. Who'd pay for that? That's why it's tricky and that's why Akamai charges these insane prices for live streaming.
A "public" secret in that network layer is usually not redundant in your datacenter even if it's promised. To have redundant network you'd need to double your investment and it'll seat idle of at 50% max capacity. For 2hr downtime per year when you restart the high-capacity routers it's not cost efficient for most clients.
Then sign a contract with Akamai, who has been in business for 25 years? You outsource if you aren’t planning to do something very often.
There is no middle ground where you commit a mediocre amount of resources, end up with downtime and a mediocre experience, and then go “but we saved money.”
Well, they didn't want to spend the money or more likely their own technical team/boss promised that they can do it themselves.
They indeed have a great CDN network, but it's not very good for this particular type of traffic. May be they will know/fix/buy next time...
When Apple moved off Akamai for their Keynote live streaming, ( I remember they also used Limestone or EdgeCast ) they had some percentage of audience using Akamai and some on their own CDN. I think it took them three years before they completely moved off Akamai. Not sure if that is still case as that was more than 10 years ago.
But like you stated, they dont want to spend money and their technical people couldn't deliver on time. This isn't a technical issue a lot of people on HN and Twitter wants to discuss about. It is a management issue.
What's your point? If they couldn't manage to secure the resources necessary, they shouldn't have agreed to livestream it. As a customer, I don't care AT ALL if it's difficult.
As a customer you're right that you don't care. As a company you care for the cost of the product you sell. Companies doesn't care about "the customer" per se, they care about their profit margin and "free market" competition pushes them to lower the price and keep the service level good. Exclusive rights to stream the fight? Well... it better be working, but we won't overprovision and pay over the expected return.
Buying the exclusive rights for 20Mil and puting 30Mil to stream it wouldn't be a very smart choice. Fuckups happen and this might be a mistake that cost them in lost reputation more than they expected to win.
Yea, the issue here isn't just that they're having issues, it's that they're having the same issues they've had before.
They have the NFL next month on Christmas day. So that'll be a big streaming session but I think it'll be nothing compared to this. Even Twitter was having problems handling the live pirate streams there.
> Even Twitter was having problems
Is that a surprise? They're not who I would think of first as a gold standard for high viewership live streams.
Well, considering it was multiple small streams I would expect them to keep them up. No have their entire streaming service have issues.
Apple was clearly larger than Google when they came out with Apple Maps, and it was issue-laden for a long time. It is not a resource-issue, but a tech development maturity issue.
>They already have prior failure experience
What was the previous fail?
Yeah didn't they crash on love is blind or one of their reality shows recently-ish?
You can't solve your way out of a complex problem that you have created and which wasn't needed in the first place. The entire microservices thing was overly complex with zero benefits
I spoke to multiple Netflix senior technicians about this.
They said that's the whole shtick.
That's a ridiculous statement. PrimeVideo is the leader in terms of sports events streaming over internet and it is composed of hundreds of microservices.
Live streaming is just much harder than streaming, and it takes a years of work and a huge headcount to get something good.
Prime famously undid some amount of their microservices recently because it couldn’t keep up, and was hideously expensive.
It was a single team for a very specific use case.
To be clear when I said that PrimeVideo is composed of hundreds of microservices, I actually meant that it's composed of hundreds of services, themselves composed, more often than not, of multiple microservices.
Depending on your definition of a microservice, my team alone owns dozens.
This comment shows how a very random blog about a very small part of a product can dominate all conversation about it. Prime video famously did not undo anything. Out of 100+ teams one team undid one service. But somehow similar comments are common on HN. I am making no judgement or microservice or not just on this particular comment.
People just do not appreciate how many gotchas can pop up doing anything live. Sure, Netflix might have a great CDN that works great for their canned content and I could see how they might have assumed that's the hardest part.
Live has changed over the years from large satellite dishes beaming to a geosat and back down to the broadcast center($$$$$), to microwave to a more local broadcast center($$$$), to running dedicated fiber long haul back to a broadcast center($$$), to having a kit with multiple cell providers pushing a signal back to a broadcast center($$), to having a direct internet connection to a server accepting a live http stream($).
I'd be curious to know what their live plan was and what their redundant plan was.
You are making excuses for a multibillion dollar company that has been in this game for many years. Maybe the first to market in streaming.
This isn’t NFLX’s first rodeo in live streaming. Have seen a handful of events pop up in their apps.
There is no excuse. All of the resources and talent at their disposal, and they looked absolutely amateurish. Poor optics.
I would be amazed if they are able to secure another exclusive contract like this in the future.
Sorry for the off topic but what’s this thing that I only come across in Hacker News about referring to a company by their stock exchange name (APPL, MSFT, etc) outside of a stock context? It seems really weird to me.
In-group signaling for people who like playing or thinking about the stock market. Similar to how people who make travel a big part of their identity refer to cities by their airport code.
Or maybe they are pilots?
"I flew into LHR on Monday" - frequent flyer
"I flew into EGLL on Monday" - pilot
not really. It’s just something I adopted as part of a job I had for fintech 3-4 yrs ago
I don’t do it as some sort of “signaling” for “fintech bros” or anything like that
I have always assumed that a focus on stock tickers is the natural result when your primary user base is a group of people hyper focused on “total compensation” and stock grants. The name hackernews is merely a playful reference to the history of the site. Like the name “Patriot Act.”
BigTechMercenaryNews ?
As a counterpoint to some other replies, I do this sometimes, not thinking at all about stocks but instead as a standardized abbreviation of sorts. Ms for example can mean tons of things from a title to multiple sclerosis to milliseconds. MSFT is clear and half the length.
`Ms` would be megaseconds ;)
Which is why we started calling it M$ in the 2000s, emphasizing its overriding goal of making ca$h off its users
I just assume it's people who spend too much time thinking about the stock context.
It's the core purpose of the organization. The ticker represents the most fundamental drive of the org. Seems appropriate and honest.
I used to gather fringe signals for shitty hedge funds, “fintech” from public communities such as HN.
I think I subconsciously adopted it since it made my job easier. Sort of how I use YYYYMMDD format in almost everything from programming to daily communication.
merriam-webster.com/dictionary/brevity
Writing APPL instead of Apple doesn’t get you any fewer keystrokes.
Technically it’s one fewer keystroke (and it’s AAPL).
It’s a lot fewer keystrokes for MS (Morgan Stanley), GS (Goldman Sachs) and MSFT (Microsoft) than it is for AAPL, but it’s a force of habit for some. Once you’re used to referring to firms by their ticker symbols, you do it all the time.
E.g. an ex trader friend still says “spot” instead of “point” when referring to decimal points, even if talking in other contexts like software versions.
Tangent: Not fewer like F (Ford) or H (Hyatt Hotels). Unfortunately we don't have a full alphabet, missing {I, N, P, Q, Y}.
I guess if we allowed the first two letter ticker symbol for the missing singles, we could send messages by mentioning a bunch of company names.
Eg "Buy: Dominion Energy, Agilent. Hold: Nano Labs. Sell: Genpact." would refer to our esteemed moderator, and "Hyatt Hotels is pleased to announce special corporate rates for Nano Labs bookings" to this site itself.
[maybe it would be better to use the companies where the corporate name and ticker letter don't match? Like US Steel for X and AT&T for T?]
AAPL is only fewer keystrokes than Apple if you’re in a physical keyboard and hold the shift key, which makes it hardly more convenient. If you use caps lock presumably you’ll press it again.
On a phone, at least in iOS, you have to double tap the shift key.
Technically, you forgot capitalization, so it's more keystrokes
Apple is also capitalized, so it's still fewer. Unless you hit "Shift" separately for each character in AAPL?
Ugh. Similar (huge) pet peeve about people who say "n.B." instead of "note".
But it's "note well", not just "note"
Not sure if you’re joking, but if not: there’s no practical difference is what is being asked of the reader. Nobody predicates any decision on whether they were asked to note something vs note it well.
It’s pointless jargon.
also, the symbol for apple is AAPL.
A company that readily admits it burns out SWRs and SREs in exchange for the big bucks.
Just what the fuck are these people doing?
If I were a major investor in them I'd be pissed.
You probably are a major investor, incidentally, through your pension fund(s) and retirement savings. It is hard to avoid Netflix if you are doing any kind of broad-based index investing.
Um, aksually…
I was pointing out how dumb a multibillion dollar company is for getting this so wrong. Broadcasting live events is something that is underestimated by everyone that has never it, yet hubris of a major tech company thinking it knows better is biting them in the ass.
As many other people have commented, so many other very large dwarfing this event have been pulled off with no hiccups visible to the viewers. I have amazing stories of major hiccups during MLB World Series that viewers had no idea about happening, but “inside baseball” people knew. To the point that the head of the network caught something during the broadcast calling the director in the truck saying someone is either going to be fired or get a raise yet the audience would never have noticed if the person ended up getting fired. They didn’t, btw.
This is the whole point of chaos engineering that was invented at Netflix, which tests the resiliency of these systems.
I guess we now know the limits of what "at scale" is for Netflix's live-streaming solution. They shouldn't be failing at scale on a huge stage like this.
I look forward to reading the post mortem about this.
Everyone keeps mentioning at scale. I seriously doubt this was an "at scale" problem. I have strong suspicion this was a failure at the origination point being able to push a stable signal. That is not an "at scale" issue, but a hubris of we can do better/cheaper than broadcasting standard practices
As counterpoint, I observed 2-3 drops in bitrate, but an otherwise fine experience. So the problem seems to have been in dissemination, not at the origin.
Yeah, I was switching between my phone and desktop to watch the stream and I had a seamless experience on both devices the entire time. I’m not sure why so many people are assuming this was a universal experience.
I highly doubt this. Netflix has a system of OCAs that are loaded with hard disks, are installed in ISP’s networks, and serve the majority of those ISP’s customers.
Given than many people had no problems with the stream, it is unlikely to have been an origin problem but more likely the mechanism to fanout quickly to OCAs. Normally latency to an OCA doesn’t matter when you’re replicating new catalogs in advance, but live streaming makes a bunch of code that previously “didn’t need to be fast” get promoted to the hot path.
I am not sure that it is an issue with the origination point. In fact I just thought it was my ISP because my daughter's boyfriend was watching and doing facetime with her and my video was dropping but his was not. I have 2gb fiber and we regularly stream five TVs without any issue, so it should not have been a bandwidth issue.
I've tried to watch an old Seinfeld episode during this event. It was freezing every few minutes even at downgraded bitrate. A video that should be on my local CDN node.
If it was a problem at origin, why did it get better/worse as viewership fell/rose?
Perhaps it was, or perhaps it was not.
I was watching a pirated, live retransmission of the event on Twitch (in Portuguese), and there was zero buffering on my end.
Is multicast a thing on the commercial internet? Seems like that could help.
If commercial = public, then no - you can not use multicast for this. It is heavily used within some enterprise networks though like if you go to a gym with lots of TVs they are all likely on multicast
It is weird because this was a solved problem.
Every major network can broadcast the Super Bowl without issue.
And while Netflix claims it streamed to 280 million, that’s if every single subscriber viewed it.
Actual numbers put it in the 120 million range. Which is in line with the Super Bowl.
Maybe Netflix needs to ask CBS or ABC how to broadcast
Do you live stream the superbowl? Me and everyone I know watch it over antenna broadcast tv. I think it is easier to have millions of tvs catch airwaves vs millions of point to point https video streams.
If you watch it over cable, you're live streaming it. Let's face it, that's where the vast majority of viewers see it. Few people view OTA even if the quality is better.
Live sports do not broadcast the event directly to a streamer. They push it to their broadcast centers. It then gets distributed from there to whatever avenues it needs to go. Trying to push a live IP stream directly from the remote live venue rarely works as expected. That's precisely why the broadcasters/networks do not do it that way
> If you watch it over cable, you're live streaming it.
Those are multicast feeds.
> Trying to push a live IP stream directly from the remote live venue rarely works as expected.
In my experience it almost always works as expected. We have highly specialized codecs and equipment for this. The stream is actively managed with feedback from the receiver so parameters can be adjusted for best performance on the fly. Redundant connections and multiple backhauls are all handled automatically.
> That's precisely why the broadcasters/networks do not do it that way
We use fixed point links and satellite where possible because we own the whole pipe. It's less coordination and effort to setup and you can hit venues and remotes where fixed infrastructure is difficult or impossible to install.
I chose to interpret it charitably and assume OP was saying it's not pushed from venue direct to viewer.
> We use fixed point links and satellite where possible because we own the whole pipe.
Over long distance I get better reliability out of a decent internet provision than in many fixed point to point links, and certainly when comparing at a price point. The downside of the internet is you can't guarantee path separation - even if today you're routing via two different paths, tomorrow the routes might change and you end up with everything going via the same data centre or even same cable.
> If you watch it over cable, you're live streaming it.
Which is probably done over the cableco's private network (not the public Internet) with a special VLAN used for television (as opposed to general web access). They're probably using multicast.
Is cable video over IP now? Last time I looked (which was forever ago), even switched video was atsc with a bit of messaging for the cable box to ask what channel to tune to, and to keep the stream alive. TV over teleco systems seems to be highly multicast, so kind of similar, headend only has to send the content once, in a single bitrate.
Not really the same as an IP service live stream where the distribution point is sending out one copy per viewer and participating in bitrate adaptation.
AFAIK, Netflix hasn't publicly described how they do live events, but I think it's safe to assume they have some amount of onsite production that outputs the master feed for archiving and live transcoding for the different bitrate targets (that part may be onsite, or at a broadcast center or something cloudy), and then goes to a distribution network. I'd imagine their broadcast center/or onsite processing feeds to a limited number of highly connected nodes that feed to most of their CDN nodes; maybe more layers. And then clients stream from the CDN nodes. Nobody would stream an event like this direct from the event; you've got to have something to increase capacity.
> Is cable video over IP now?
Over the US and Canada it mostly is, though how advanced the transition is is very regional.
The plan is to drop both analog signal and digital (QAM) to reclaim the frequencies and use them for DOCSIS internet.
Newer set top boxes from Comcast (xfinity) runs over the internet connection (in a tagged VLAN on a private network, and they communicate over a hidden wifi).
>I think it is easier to have millions of tvs catch airwaves vs millions of point to point https video streams.
Exactly! It was a solved problem.
The Superbowl isn't even the biggest. World Cup finals bring in billions of viewers.
When Netflix started it was the first in the space and breaking ground which is how they became a "tech" company that happens to stream media however it has been 15 years and since than the cloud providers have basically build "netflix as a service". I suspect most of the big streamers are using that instead of building their own in house thing and going through all the growing pains netflix is.
You know they were commoditized when "Build Netflix" became a system-design interview question.
Solves differently though, right? Cable broadcasts are not the same as a streaming video over the internet, right?
Is the goal “show the fight” or “use this technology”?
I guarantee the people trying to watch the fight cared more about watching the fight than how the fight was watched.
You’re talking about the contribution from the venue to the boardcast centre, increasingly not a full program but being mixed remotely.
That’s a very different area to transmission of live to end users.
What are you talking about? The signal coming from a live event is the full package. The output of “the truck” has multiple outs including the full mix of all grafix, some only have the mix minus any branding, etc. While the isos get recorded in the truck, they are not pushed out to the broadcast center.
All of the “mixing” as you call it is done in the truck. If you’ve never seen it, it is quite impressive. In one part of the truck is the director and the technical director. The director is the one calling things like “ready camera 1”, “take 1”, etc. the TD is the one on the switcher pushing the actual buttons on the console to make it happen. Next to them is the graphics team prepping all of the stats made available to the TD to key in. In another area is the team of slomo/replay that are taking the feeds from all of the cameras to recorders that allow the operators to pull out the selects and make available for the director/TD to cut to. Typically in the back of the truck is the audio mixer that mixes all of the mics around the event in real time. All of that creates the signal you see on your screen. It leaves the back of the truck and heads out to wherever the broadcaster has better control
Not nowadays, more and more remote production for larger and larger events, and it’s coming on rapidly. Directors are increasing sitting in centralised control rooms rather than in a scanner.
BT sport are interesting, spin up graphics, replay, etc in an AWS environment a couple of hours before. I was impressed by their uefa youth league coverage a couple of years ago, and they aren’t slowing down
https://www.limitlessbroadcast.tv/portfolio/uefa-youth-leagu...
https://www.svgeurope.org/blog/headlines/stratospheric-revol...
Obviously not every broadcast, or even most, are remote now, but it’s an ever increasing number.
I don’t know how the US industry works, I suspect the heavy union presence I’ve seen at places like NAB will slow it, but in Europe remote production is increasingly the future.
> People just do not appreciate how many gotchas can pop up doing anything live.
Sure thing, but also, how much resources do you think Netflix threw on this event? If organizations like FOSSDEM and CCC can do live events (although with way smaller viewership) across the globe without major hiccups on (relatively) tiny budgets and smaller infrastructure overall, how could Netflix not?
Scale changes everything, I don't think it's fair to shrug this off
Last I checked, p2p solves a lot of the scaling issues.
Haven't Asian live sports been using p2p already two decades ago ?
(What is the biggest Peertube livestream so far ?)
This is true, but scale comes after production. Once you have the video encoded on a server with a stable connection the hard part is over. What netflix failed to do is spread the files to enough servers around the globe to handle the load. I'm surprised they were unable(?) to use their network of edge servers to handle the live stream. Just run the stream with a 10 second delay and in that time push the stream segments to the edge server
This right here is where I'd expect the failure to occur. This isn't Joey Beercan running OBS using their home internet connectivity.
This is a major broadcast. I'd expect a full on broadcast truck/trailer. If they were attempting to broadcast this with the ($) option directly to a server from onsite, then I would demand my money back. Broadcasting a live IP signal just falls on its face so many times it's only the cheap bastard option. Get the video signal as a video signal away from the live location to a facility with stable redundant networking.
This is the kind of thinking someone only familiar with computers/software/networking would think of rather than someone in broadcasting. It's nice to think about disrupting, but this is the kind of failure that disruptors never think about. Broadcasters have been there done that with ensuring live broadcasts don't go down because an internet connection wasn't able to keep up.
Lumen has their Vyvvyx product/service which uses fiber for broadcast television.
I’ve been using vyvx since it was called global crossing/genesis, it was fairly unique when it started, but point to point ip distributon of programs has been the norm for at least 15 years. Still have backup paths on major events on a different technology, you’d be surprised how common a dual failure on two paths can be. For example output from the euro football this summer my mai paths were on a couple of leased lines with -7, but still had a backup on some local internet into a different city just incase there was a meltdown of the main providers network (it’s happened before with ipath, automation is great until it isn’t)
> Once you have the video encoded on a server with a stable connection the hard part is over.
The hard part is over, and people new to the problem think they are almost done, but then the next part turns out to be 100x harder.
Lots of people can encode a video.
Yeah, I agree with this, especially the everything part. Netflix isn't exactly a scrappy FOSS/hackers organization or similar.
The CCC video crew has its fair share of geeks from broadcasting corporations and studio houses. Their combined institutional knowledge about live events and streaming distribution is probably in the same ballpark as that of giant global TV networks.
They also have the benefit of having practiced their craft at the CCC events for more than a decade. Twice a year. (Their summer event is smaller but still fairly well known. Links to talks show up on HN every now and then.)
Funky anecdote: the video crew at Assembly have more broadcasting and live AV gear for their annual event than most medium-sized studios.
> Their combined institutional knowledge about live events and streaming distribution
Now if they could just get audio levels and compression figured out.
> If organizations like FOSSDEM and CCC can do live events (although with way smaller viewership) […]
Or, for that matter, Youtube (Live) and Twitch.
> how much resources do you think Netflix threw on this event?
Based on the results, I hope it was a small team working 20% time on the idea. If you tell me they threw everything they had at it to this result, then that's even more embarrassing for them.
It wasn't even just buffering issues, the feed would just stop and never start again until I paused it and then clicked "watch live" with the remote.
It was really bad. My Dad has always been a fan of boxing so I came over to watch the whole thing with him.
He has his giant inflatable screen and a projector that we hooked up in the front lawn to watch it, But everything kept buffering. We figured it was the Wi-Fi so he packed everything up and went inside only to find the same thing happening on ethernet.
He was really looking forward to watching it on the projector and Netflix disappointed him.
> My Dad has always been a fan of boxing
What did your Dad think about the 'boxing'?
Commercial boxing has always been like WWE or MMA with a thin veneer of actual sport to it, i.e. it is just entertainment[1].
To rephrase your question then what does someone think of the entertainment on display?
I don't think it was good entertainment.
None of the hallmarks of a good show was present. i.e. It wasn't close, nor was it bloody or anything unexpected like say a KO everything went pretty much as expected. It wasn't nice watch as all,no skill or talent was on disply, all Paul had to do was use his speed to backpedal from the slow weak punches of a visibly older tyson with a bum knee and land some points occasionally to win.
--
[1] There is a deeper argument here is any spectator sports just entertainment or is truly about skill talent and competition. Boxing however including the ones promoted by traditional four major associations falls clearly on the entertainment side than say another sport like NFL to me.
Was this necessary? The comment was on a tech forum about the tech issues, do we really need to reprosecute the argument that it wasn’t real boxing here too? There are plenty of other places for those so painfully inclined to do so
I appreciated the answer by manquer. HN is tech-focused but not tech-only, right?
[dead]
Cable TV (or even OTA antenna in the right service area) is simply a superior live product compared to anything streaming.
The Masters app is the only thing that comes close imo.
Cable TV + DVR + high speed internet for torrenting is still an unmatched entertainment setup. Streaming landscape is a mess.
It's too bad the cable companies abused their position and lost any market goodwill. Copper connection direct to every home in America is a huge advantage to have fumbled.
The interesting thing is that a lot of TV infrastructure is now running over IP networks. If I were to order a TV connection for my home I'd get an IPTV box to connect to my broadband router via Ethernet, and it'd simply tell the upstream router to send a copy of a multicast stream my way.
Reliable and redundant multicast streaming is pretty much a solved problem, but it does require everyone along the way to participate. Not a problem if you're an ISP offering TV, definitely a problem if you're Netflix trying to convince every single provider to set it up for some one-off boxing match.
I have wondered if better multicast support will happen just for cost savings, as the amount of live content increases.
So far, no one seems particularly motivated.
The Masters app is truly incredible, I don't know if it gets enough praise.
What's so great about it?
This. Im honestly going to cancel my streaming shit. They remove and mess with it so much. Like right now HBO max or whatever removes my recent watches after 90 days. why?
Apple TV MLB games look incredible compared to live cable tv.
On a few forum sites I'm on, people are just giving up. Looking forward to the post-mortem on how they weren't ready for this (with just a tiny bit of schadenfreude because they've interviewed and rejected me twice).
They sabotaging OP just for a reverse schadenfreude play
AB84 streamed it live from a box at the arena to ~5M viewers on Twitter. I was watching it on Netflix, I didn't have any problems, but I also put his live stream up for the hell of it. He didn't have any issues that I saw.
> He didn't have any issues that I saw.
He’s definitely got issues..
were calling antonio brown ab84 now? What happened to Mr. BC?
It’s not everyone. Works fine for me though I did have to reload the page when I skipped past the woman match to the Barrios Ramos fight and it was stuck buffering at 99%.
You skipped the best part.
Can you share which forums
/r/netflix and /sp/
Chiefsplanet.com, unstuckpolitics.com, my buddies on group text :)
The post-mortem will be interesting indeed.
I wonder if there will be any long term reputational repercussions for Netflix because of this. Amongst SWEs, Netflix is known for hiring the best people and their streaming service normally seems very solid. Other streaming services have definitely caught up a bit and are much more reliable then in the early days, but my impression still has always been that Netflix is a step above the rest technically.
This sure doesn't help with that impression, and it hasn't just been a momentary glitch but hours of instability. And the Netflix status page saying "Netflix is up! We are not currently experiencing an interruption to our streaming service." doesn't help either...
Not the same demographic but their last large attempt at live was through a Love is blind reunion. It was the same thing, millions of people logging in, epic failure, nothing worked.
They never tried to do a live reunion again. I suppose they should have to get the experience. Because they are hitting the same problems with a much bigger stake event.
yup wanted to say that live stream stuttering has happened before on Netflix - I don't think the reputation is deserved.
From a livestreaming standpoint, netflix is 0/x - for many large events such as love is blind, etc.
From a livestreaming standpoint, look to broadcast news, sports / Olympics broadcasters, etc and you'll see technology, equipment, bandwidth, planning, and professionalism at 1000x of netflix.
Heck, for publicly traded quarterly earnings livestream meetings, they book direct satellite time in addition to fiber to make sure they don't rely only on terrestrial networks which can fail. From a business standpoint, failure during a quarterly meeting stream can mean the destruction of a company (by making shareholders mad that they can't see and vote during the meeting making them push for internal change) - so the stakes are much higher than live entertainment streaming.
Netflix is good at many things, livestreaming is not one of those things.
Even some of the old guard can do this. The Olympics worked pretty well (despite the awkward UI), and that was Peacock/NBC.
Perhaps Netflix still needs a dozen more microservices to get this right...
All valid points though each of those examples seemingly only has a fraction of the viewers of the netflix events, right?
for livestreams, individual events like the Olympics probably has a surge audience of 10x of netflix events.
Netflix events is small potatoes compared to other livestream stalwarts.
Imagine having to stream a cricket match internationally to UK / India / Australia with combined audience that crushes the Superbowl or a football match to all of Europe, or even something like livestreaming F1 racing that has multiple magnitudes of audience than a boxing match and also has 10x the number of cameras (at 8K+ resolution) across a large physical staging arena (the size of the track/course) in realtime, in addition to streaming directly from the cockpit of cars that are racing 200mph++.
Livestream focused outfits do this all day, everyday.
Netflix doesn't even come close to scratching the "beginner" level of these kinds of live events.
It's a matter of competencies. We wouldn't expect Netflix to be able to serve burgers like McDonald's does - Livestreaming is a completely different discipline and it's hubris on Netflix's part to assume just because they're good at sending video across the internet they can competently do livestreaming.
this is false, the tom brady roast was live streamed
yes, love is blind failed, but was definitely not the most recent attempt. they did some other golf thing too, iirc
tom brady is largely a guy popular in the USA whereas Mike Tyson is globally famous. It follows that this fight would attract a larger audience.
the point i’m making is that the netflix live streaming timeline didn’t go
chris rock -> love is blind -> mike tyson
they have had other, successful executions in between. the comment i was replying to had cherry picked failures and i’m trying to git rebase them onto main.
Is anyone surprised? I don't see how their infrastructure can handle this when it was designed for non-realtime precaching of prerecorded content.
From what I've heard, Netflix has really diluted the culture that people know of from the Patty McCord days.
In particular, they have been revising their compensation structure to issue RSUs, add in a bunch of annoying review process, add in a bunch of leveling and titles, begin hiring down market (e.g. non-sr employees), etc.
In addition to doing this, shuffling headcount, budgets, and title quotas around has in general made the company a lot more bureaucratic.
I think, as streaming matured as a solution space, this (what is equivalent to cost-cutting) was inevitable.
If Netflix was running the same team/culture as it was 10 years ago, I'd like to say that they would have been able to pull of streaming.
Were they not able to hire enough top-skilled people? If not, why not?
Or did they have a lot of needs that they decided didn't require top-skilled people?
Or was this a beancounter thing, of someone deciding that the company was paying more money on staffing than they needed to, without understanding it?
Combination of 2 and 3. The business changed. Streaming was more or less a solved problem for Netflix. They needed money for content, not expensive engineers. Ted is co-ceo… you can see where the priority is.
My observation is that Netflix is one of those places that brags about how they do so much with so little employees.
Few. Little employees would be...small.
So the issue is that Netflix gets its performance from colocating caches of movies in ISP datacenters, and a live broadcast doesn't work with that. It's not just about the sheer numbers of viewers, it's that a live model totally undermines their entire infrastructure advantage.
See: https://openconnect.netflix.com/en/
Correct, this is not Netflix’ regular cup of tea, and it’s a very different problem to solve. They can probably use their edge caches, but it’s challenging.
How YouTube does this? Netflix is like drop in the ocean when compared to.
My wild assed guess is the differences in the edge nodes.
Netflix's edge nodes are optimized for streaming already encoded videos to end users. They have to transcode some number of formats from the source and send them all to the edge nodes to flow out. It's harder to manage a ton of different streams flowing out to the edge nodes cleanly.
I would guess YouTube, being built on google's infrastructure , has powerful enough edge nodes that they stream one video stream to each edge location and the edges transcode for the clients. Only one stream from source to edge to worry about and is much simpler to support and reason about.
But that's just my wild assed guess.
> I would guess YouTube, being built on google's infrastructure , has powerful enough edge nodes that they stream one video stream to each edge location and the edges transcode for the clients.
Ha, no, our edge nodes don't have anywhere near enough spare CPU to do transcoding on the fly.
We have our own issues with livestreaming, but our system's developed differently over the past 15 years compared to Netflix's. While they've historically focused on intelligent pre-placement of data (which of course doesn't work for livestreaming), such an approach was never feasible for YT with the sheer size of our catalog (thanks to user-generated content).
Netflix is still new to the space, and there isn't a good substitute for real-world experience for understanding how your systems behave under wildly different traffic patterns. Give them some time.
It also helps that youtube serves shit tier quality videos more gracefully. Everyone is used to the step down to pixel-world on youtube to the point where they don’t complain much.
And decent part of these users are on free tier, so they are not paying for it. That alone gives you some level of forgiveness. At least I am not paying anything for this experience.
I stream hours of 4k60 from youtube every day for free.
I get maybe 1m total of buffering per week, if that.
Seems uncharitable to complain about that.
Live streams have different buffering logic to video on demand. Customers watching sports will get very upset if there is a long buffer, but for a VOD playback you don't care how big the buffer is. Segment sizes are short for live and long for VOD because you need to adapt faster and keep buffers small for Live, but longer download segments are better for buffering.
Sorry, yeah, for some stupid reason I was not thinking about live streams.
In my experience even YouTubeTV has problems sometimes. I'll have the 1080p (and enhanced mode also I think) quality set and still deal with a lot of compression artifacts.
Not sure how Netflix does it. But this is not very time sensitive, and I would have delayed the stream by 15 to 30 seconds to cache it and then deliver to everyone.
Not sure I fully buy that. The “live” stream is rarely “live”. It’s often a highly cached buffer that’s a few mins from latest. Those in isp caches can still help here.
I wonder how effective it would be to cache live events with a delay. Write to the tail, read from the head.
that’s totally unacceptable for live sports which people are able to bet on
I have bad news for you. This is how it works already for “live” sports
Correct. Here are some latency numbers from the last SuperBowl: https://www.phenixrts.com/resource/super-bowl-2024
Even the best latency is dozens of seconds behind live action.
Yep. Having actually worked on this sort of stuff I can confirm.
Your ISP doesn't have enough bandwidth to the Internet (generally speaking) for all users to get their feed directly from a central location. And that central location doesn't have enough bandwidth to serve all users even if the ISP could. That said, the delay can be pretty small, e.g. the first user to hit the cache goes upstream, the others basically get the stream as it comes in to the cache. This doesn't make things worse, it makes them better.
I don't bet so I have no clue, but why is that? Are people able to place bets in the middle of the match or something? I would have assumed bets get locked in when the fight starts
Idk about traditional sports books but on Polymarket you can certainly continue betting at any time until the market resolves.
They end betting some minutes before the fight ends.
I last saw Tyson at +500 while Jake was around -800 on DraftKings somewhere in the 6th round.
a match has multiple rounds doesn't it? Seems logical to bet on individual rounds or events that can occur throughout the match.
This is kind of silly because the delay between actual event happening to showing up on OTA TV or cable TV to showing up on satellite TV can already be tens of seconds.
isn't this why people would listen via radio?
Why should they catering to such an audience in first place?
I think this could be one of upsells that Netflix could use.
Premium: get no delay
Normal users: get cache and delay
Or, hear me out here, it's a wild concept, just work.
You know, like every other broadcaster, streaming platform, and company that does live content has been able to do.
Acting like this is a novel, hard problem that needs to be solved and we need to "upsell" it in tiers because Netflix is incompetent and live broadcasting hasn't been around for 80+ years is so fucking stupid.
Every other live platform has a delay of multiple seconds
Live sports require microwave relays for high frequency sports bets
I would be surprised if they don't already do this. The question is how big a buffer to trade off for delay...
I don't think that live doesn't work with caches. No one watching live would care about a O(s) delay, which is highly amenable to caching at ISPs and streaming changes from there to downstream clients. Offhand I'd say that would support O(ms) delay but no less.
That model still works for streaming. You have a central source stream only to the distributed edge locations, then have clients only stream from their local edge location. Even if one region is overwhelmed, the rest can still work. Load on the central source is bounded.
I'm curious if the root cause is more variable than usual latency.
Sample size 1, but...
I saw a ton of buffering and failure on an embedded Netflix app on a TV, including some infinite freezes.
Moved over to laptop, zero buffering.
I assume the web app runs with a lot bigger buffer than whatever is squeezed into the underpowered TV.
Likely these devices use different media formats and/or quality levels. And yes, it's possible one device buffers more than the other. Infinite freezes sounds like some routing issues or bugs.
When I was watching the behavior on the tv, was wondering if buffering sends some separate, non-business-as-usual requests, and that part of Netflix's delivery architecture was being overloaded.
E.g. "give me this previous chunk" vs "send me the current stream"
Buffering typically just consumes the same live stream until there's enough in the buffer. No difference other than the request rate being potentially higher. At least I can confidently say that for the standard players/video platforms. NetFlix could be doing something different. I'm not sure if they have their own protocols. But I'd be very surprised if the buffering used a completely different mechanism.
Damn that sucks. I wonder if they could have intentionally streamed it 5 min late? I don’t have all the context around the fight though — maybe a competing service would win if Netflix intentionally induced delay?
they were introducing 5 minute delays on some of the clients. I noticed my ipad was always live and the smart tv had a 5 minute delay but you could fast forward to live.
they will learn :)
If Netflix still interviews on hacker rank puzzles I think this should be a wake up call. Interviewing on irrelevant logic puzzles is no match for systems engineering.
I did a round of netflix interviews, didn't get an offer (but passed the technical coding rounds) they absolutely had the best interview process of any company I've interviewed at my entire career.
They do make you code but the questions were 1. Not on hacker rank or leetcode 2. Pratical coding questions that didn't require anything more than basic hashmaps/lists/loops/recursion if you want. Some string parsing, etc.
They were still hard, you had to code a fast, but no tricky algorithms required. It also felt very collaborative, it felt like you were driving pair programming. Highly recommended even though didn't get an offer!
For systems design and engineering, absolutely this. I expected the very highest standards and upmost uptime from Netflix, similar to Google and Amazon.
Tells you the uselessness of their engineering blogs.
Was live streaming much of a use case for them before this?
They stream plenty of pre recorded video, often collocated. Live streaming seems like something they aren’t yet good at.
If places like Paramount+ can figure it out, Netflix, given their 10+ year head start on streaming and strong engineering culture, should also have been able to. And if you don't like my example, literally every other streaming service has streamed live sports without issue. YT TV, Hulu, Paramount+, Amazon Prime, Peacock, even Apple TV streams live sports.
It may be "new" to them, but they should have been ready.
I won’t argue that they shouldn’t have done better, I’m only pointing out that this is fairly different from their usual product. Amazon, YouTube, and Hulu all have a ton of experience with live streaming by now. Apple has live streamed wwdc for several years.
I did expect that Netflix would have appropriately accounted for demand and scale, though, especially given the hype for this particular event.
Even Amazon doesn't have it perfected for Live.
Live sports and VOD movies/TV are very different beasts.
https://www.cnn.com/2023/03/04/entertainment/chris-rock-netf... (2023-03-06)
Has Netflix ever live streamed something before? People on reddit are reporting that if you back up the play marker by about 3 minutes the lag goes away. They've got a handle on streaming things when they have a day in advance to encode it into different formats and push it to regional CDNs. But I can't recall them ever live streaming something. Definitely nothing this hyped.
Love is blind reunion which had major problems. Chris Rock comedy which only had a few issues. The Netflix Cup which had issues.
Chris Rock comedy special, and the Tom Brady roast. Nothing on this scale, though.
I don't spend much time streaming, but I got a glimpse of the Amazon Prime catalog yesterday, and was surprised at how many titles on the front page were movies I'd actually watch. Reminded me of Netflix a dozen years ago.
Amazon Prime isn't so great. Lots of for rent/purchase content or content with ads these days. And they end up repeating slots of content in all the rows in their UI, so I end up seeing the same suggestions everywhere rather than much that's new (other than first party productions).
To me they're basically padding their front page.
But honestly that's most of the major streaming platforms these days. I recently cancelled Disney Plus for similar reasons. The only reasons I don't cancel prime or Netflix are because I have family members I split the memberships with to share.
I recently found a lil dvd rental place in my city. It’s a non-profit, they also do archivals and stuff.
It’s pretty much a two-story townhouse packed head to toe with DVDs (lots of blu rays!)
You don’t realize how limited the streaming collection is until you’re back in a movie store, looking through thousands and thousands of movies you would never find otherwise.
Since I found it, I’ve started doing movie night every week with my friends. It’s such an absolute blast. We go to the store, each pick out a random movie that looks good (or bad, in a good way) or just different.
All of a sudden, I love movies again!!
That's an excellent option. I think it'd be remiss not to mention local libraries. Of course, your mileage may vary, but the ones I've gone to do seem to have adequate selections. I just don't often make time to go there and browse like I would have at traditional video rental places back in the day.
Heck, mine even have some video games; though from when I've checked they're usually pretty back-reserved.
I was in high school in the early 00s, and going to the movies was such a big part of my life. Now, I never even know what's out.
I suspect life stage is a factor, but it does feel like there are many classes of entertainment (cinema and standup come to mind) that don't resonate like they used to.
Back in the day everyone was watching the same thing. The choices for entertainment were limited to whatever was showing in movie theatres, whatever was on TV and whatever record stores were selling.
Heh Heh Heh. Maybe there will be a resurgence of Blockbuster style retail stores... ;)
I've given Netflix a lot more money than I've gotten value out of. I've had an account for ~15y and only really use it for airplanes unless there's a specific thing I'm excited to watch.
I'm in the same boat where as soon as they make it too hard to share, I'll probably cancel it. I think the main reason their sharing crackdown hasn't been a problem so far is that I use it so seldomly, it thinks the "main" address is my parents, which makes it easy for me to pass the "are you traveling" 2FA on my own phone when I do want to watch something.
> And they end up repeating slots of content in all the rows in their UI, so I end up seeing the same suggestions everywhere rather than much that's new
All of the streaming services do this and I hate it. Netflix is the worst of the bunch, in my experience. I already scrolled past a movie, I don't want to watch it, don't show it to me six more times.
Imagine walking through a Blockbuster where every aisle was the same movies over and over again.
Amazon Prime front page includes currently a lot of ads for movies that you can rent or ”buy”. Are you sure these movies weren’t them?
This is my biggest issue with Prime Video. You never know what's included and what costs extra.
There's also the "FreeVee" items, which have ads regardless of whether you're a prime subscriber or not. And it feels like a lot of their catalog has been transferred over to FreeVee.
You can toggle to only show free content. Still get ads but theyre obvious.
> Reminded me of Netflix a dozen years ago.
It's been pretty rough the last few years. So many great films and series, not to mention kids programming, removed to make way for mediocre NeTfLiX oRiGiNaLs and Bollywood trash.
is this specifically in India? I never see bollywood stuff in the US but half the catalogue is dubbed/subbed korean dramas
[dead]
Prime Video has to be the worst of all major streaming services. The video quality is horrible, its crippled with ads (3 not skippable ads for an episode of 45 minutes, lastly), and a lot of interesting titles are behind a "partner paywall".
I have prime and my shopping experience is crippled with ads too.
I think it got worse for sellers recently too. If I search for something, like a specific item using its description, sometimes the only result for it shows "sponsored".
It used to show up as sponsored and also unsponsored below.
If this changed, I assume it is bad for the seller. Either they pay for all search results, or their metrics are skewed because all searches were helped by "sponsorship" (and there are no longer unsponsored hits)
I was watching the rings of power and it started with a "Commercial free experience provided by so and so" with a long ad at the start of the episode, and then a third of the way into the episode, at a critical action part, it broke in the middle of the actor's sentence to a 6 minute ad block.
I exited playback and haven't gone back to finish it. I'll wait for it eventually to make it to a Blu-ray release someday.
> Prime Video has to be the worst of all major streaming services
I would put Prime Video at 2nd worst. Absolute worst IME is Paramount+.
Edit: worst for streaming quality
It's also super annoying to try to watch on a computer compared to Netflix or YouTube
Netflix pivoted to be a platform to waste as much of your time as possible vs entertain.
The Amazon originals are way better imo. They do the dark pattern crap with paid content, as one would expect from Amazon.
Every Amazon show looks the same, yellow washed or something; and they should spend more money on costumes - they get beat by low budget cosplay.
Fallout was pretty good. Very loyal to the game.
> ut my impression still has always been that Netflix is a step above the rest technically.
I always assumed youtube was top dog for performance and stability. I can’t remember the last time I had issues with them and don’t they handle basically more traffic than any other video service?
Maybe a client issue, but i've got a low-end smart tv which handles netflix fine, but youtube is unwatchable due to buffering and failed cuts to adverts
Maybe that’s it. I pay from premium though so don’t have the advert issue (apples to apples).
I think Netflix will have even more sw engineers looking to work there once they notice even for average quality of work they can get paid 3 times more than their current pay.
Netflix won't take a hit here.
Most people pay Netflix to watch movies and tv shows, not sports. If I hadn't checked Hacker News today, I wouldn't even know they streamed sports, let alone that they had issues with it. Even now that I do, it doesn't affect how I see their core offering, which is their library of on-demand content.
Netflix's infrastructure is clearly built for static content, not live events, so it's no shock they aren't as polished in this area. Streaming anything live over the internet is a tough technical challenge compared to traditional cable.
Netflix is trying to expand into live sports. This event wasn’t a one off thing. There is an NFL game they are streaming at the end of the year.
I think they have to refund the fees for a month to anyone who streamed this fight. That's the only thing that seems fair.
It has been pretty useless. At the moment seems to be working only when running in non-live mode several minutes behind.
So if there are 1 million trying to stream it, that means they would lose $15 million. So.. they might only give a partial refund.
But people should push for an automatic refund instead of a class action.
Is it really that big a deal if you are watching a few minutes behind?
I've watched ball games on streaming networks where I can also hear a local radio broadcast, and the stream is always delayed compared to the radio, sometimes by quite a lot. But you'd never know it if you were just watching the stream.
>Is it really that big a deal if you are watching a few minutes behind?
i don't bet on sports. but from friends who do: yes, it's a really really big deal.
I have seen this type of comment a few times here. Why does it matter to a bettor if the stream is delayed by 1-5 mins? How would they even know?
I remember a few years ago reading about a scam at the Australian Open Tennis where there were people inside the stadium who were betting on individual points as they happened.
I guess they could bet before the betting streams caught up.
It seems ridiculous to me that you can bet on individual points, but here we are.
The issue is that most people are trying to watch live which is what it's advertised as. And until they figure out that they need to watch X minutes behind, it is unwatchable. Many will not figure that out.
So for the first hour it was just total frustration until I stopped trying to go back to live mode.
Potentially more so in this brave new world of increased sports betting.
https://en.wikipedia.org/wiki/The_Sting
Offshore combined streaming and betting houses will be cleaning up the rubes.
Internet streams are not real-time even in the best case. There is always a few seconds of delay, often quite a bit more than that depending on number of hops and link speeds, congestion, etc.
well it's live sports, watching live is the big deal. Also people are gambling on the outcomes so watching a few minutes behind is a big deal
I think why I will remember about this fight is not the (small) streaming issue I encountered as much as the poor quality of the fight itself. For me that was the reputational loss. Netflix was touting “NFL is coming to Netflix”. This fight did not really make me want to watch that.
I don't care about boxing or UFC or the grade-A douchebags that are the Paul brothers, but I tuned in just because I had the time and a Netflix subscription.
It was actually great that the fight itself was so boring because it justifies never having to spend time / money on that kind of bullshit. It was a farce. A very bright, loud, sparkly, and expensive (for some people) farce.
The value I got from it was the knowledge that missing out on that kind of thing isn't really missing out on anything at all.
I used to work for a live streaming platform once. We always joked that VOD (Netflix) was "easy" compared to live.
Not really a joke, though? VOD has obvious methods to cheat a bit. Redundancy abounds and you can even network shape for costs. Could probably get even better compression for clear reasons.
Live, not so much. One source that you have to fanout from and absolutely no way to get cheap redundancy. Right?
Yes, of course but it was still a cope because we never saw more than maybe 3M concurrent viewers at a time
I don't think it'll be long-term. Most people will forget about this really quickly. It's not like there will be many people saying "Oh, you don't want to sign up for Netflix, the Tyson fight wasn't well streamed" in even 6 months nevermind 10 years.
>but my impression still has always been that Netflix is a step above the rest technically.
Maybe if we're not counting Youtube as 'streaming', but in my mind no one holds a candle to YT quality in (live)streaming.
Is cable and broadcast better for live TV? No scaling issues. Doesn't matter how many people tune in.
It's fundamentally different, for sure.
Most third-party internet-based streaming solutions are overlaid on top of a point-to-point network, while broadcast is one-to-many, and even cable tends to use multicast within the cable provider's network.
You have potentially different problems, e.g. limited bandwidth / spectrum. If, say, there are multiple games going on at the same time, you can only watch whichever feed the broadcaster decides to air. And, of course, regardless of the technology in use, there are matters of acquiring rights for various events. One benefit of internet-based streaming is that one service can acquire the rights and be able to reach everyone, whereas an individual cable provider might only reach its direct subscribers.
On cable(terrestrial is entirely different) even the bandwidth or spectrum is less limiting for broadcasting multiple games. Hard thing is the other parts of production, like cameras, live-directing and live commentary. Adding new channels is less challenging than actual producing content at expected level there.
Interestingly that TV nowadays is delivered through IP.
Seems like it
Based on this I'm wondering whether it was straight up they did not expect it to be this popular?
> Some Cricket graphs of our #Netflix cache for the #PaulVsTyson fight. It has a 40 Gbps connection and it held steady almost 100% saturated the entire time.
https://fosstodon.org/@atoponce/113491103342509883
I don't think Netflix is even designed to handle very extreme multi-region live-streaming at scale as evidenced in this event with hundreds of millions simultaneously watching.
YouTube, Twitch, Amazon Prime, Hulu, etc have all demonstrated to stream simultaneously to hundreds of millions live without any issues. This was Netflix's chance to do this and they have largely failed at this.
There are no excuses or juniors to blame this time. Quite the inexperience from the 'senior' engineers at Netflix not being able to handle the scale of live-streaming which they may lose contracts for this given the downtime across the world over this high impact event.
Very embarrassing for a multi-billion dollar publicly traded company.
They must have try to do this on the cheap, thinking they could dynamically scale on the fly for this. Big mistake.
This is a total supposition without any proof.
What more proof do you need other than the fact that streams went down worldwide on a highly anticipated event from a public company?
I wouldn't be surprised if lots of engineers at Netflix are currently now writing up a length post mortem of this.
And this is from the company that created the discipline of chaos engineering for resilience.
It is clear they under invested and took the eye of the ball with this.
This is bad, like very very bad.
The assumption that it was related to insufficient investment isn’t supported by any evidence. Flawed technical decisions can be made by the most expensive engineers too.
The evidence is that the stream went down.
We will see why it went down and to what extent they underinvested in their post mortem.
That’s not evidence for the assertion you made.
Other potential and future entertainment partners Netflix will be working with e.g. WWE, will certainly see my view as they will be questioning Netflix's capability after that major streaming issue we both saw.
This isn't Netflix's first time they had this live streaming problem.
People will see this as an underinvestment from Netflix's part and they will reconsider going to a different streaming partner.
Their CDN is colo and doesn't run on AWS.
There's a difference between live broadcasts and serving up content that's sitting on a server I guess?
In my country every time there's a big football match the people who try to watch it on the internet face issues.
Yea, it’s a bad look. But I switched to watching some other Netflix video and it seemed fine. Just this event had some early issues. Looks fine now though.
Streamed glitch free for me both on my phone and Xbox. The fight wasn’t so great though, but still a fun event. Jake Paul is a money machine right now.
same. could be a bandwidth issue at CDN/ISP level at a certain region(s) ?
> their streaming service normally seems very solid
Not trying to downplay their complexity, but last I heard Netflix is splitting the shows in small data chunks and just serves them as static files.
Live streaming is a different beast
That's how streaming (usually) works. The main URL is a playlist of transport stream files and it just downloads them in the background as you go.
Static files have been pretty much the standard streaming protocols for both VOD and live for the last 15 years ago. Before, it was Adobe Flash (RTMP).
With the way that they are designed, you can even use a regular CDN.
You can push these files to all the edges before you release the content which will protect your origin. Livestream all your edge servers are grabbing content from the origin unless you have another tier of regional servers to alleviate load.
Sure but that’s why your edge servers do request collapsing. And there are full blown CDN companies that will write an enterprise contract with you that can do this stuff with ease. Akamai is like 25 years old now.
Scale has increased but the techniques were figured out 20 years ago. There is not much left to invent in this space at the current moment so screwing up more than once is a bit unacceptable.
There's an upcoming NFL game on Netflix next month. They need to get their shit together.
Two games actually, both on Christmas Day. A day when most people are at home or the home of family or friends, and they are both pretty good late-season matchups (Chiefs-Steelers and Ravens-Texans) so I imagine viewership will be high.
If they botch the NFL games, it will surely hurt their reputation.
For me, netflix constantly forget the last episode/spot I was in a TV show. Beyond frustrating
Yeah, the funny part is that Hulu, Amazon Prime, and Peacock have all demonstrated the ability to handle an event of this caliber with no issue. Netflix now may never get another opportunity like this again.
AFAIK those three farm out to CDNs with tons of edges who know what they're doing.
I have a feeling Netflix said 'how hard could this be?' and is finding out right now.
Sure they will. They’ll just set up the next event and outside of some tech folks no one will remember this.
I mean, I guarantee you every boxing fan is never going to trust Netflix again for an event like this.
What are they going to do? Just not watch the next fight?
0 people give a shit about boxing fans. It’s not up to them.
Chances are Jake might fight Connor McGregor. Sure, Connor is not as famous as Tyson but that will also invite a large number of people to stream it.
It may vary by ISP. It’s been fine for me.
I think you are correct. Ziply Fiber said they were seeing 2.1 times their normal peak [1].
But also people were saying they weren't having any issues streaming on Ziply.
[1] https://www.reddit.com/r/ZiplyFiber/comments/1gsenik/netflix...
[dead]
In 2012 Youtube did the Red Bull stratos live stream with 8m concurrent users. We're 12 years later, Netflix fucked up.
To me the difference is that in 2012, you had companies focusing on delivering a quality product, whether it made money or not. Today, the economic environment has shifted a lot and companies are trying to increase profits while cutting costs. The result is inevitably a decline in quality. I'm sure that Netflix could deliver a flawless live stream to millions of viewers, but the question is can they do it while making a profit that Wall Street is happy with. Apparently not.
The funny thing is I was just reading something on HN like three days ago about how light years ahead Netflix tech was compared to other streaming providers. This is the first thing I thought of when I saw the reports that the fight was messing up.
In 2012 Youtube did the Red Bull stratos live stream with 8m concurrent users
8m vs 60m. And not in 4K. Not a great choice for comparison.
Source for Netflix pulling 60m streams to watch Tyson/Paul?
https://about.netflix.com/en/news/60-million-households-tune...
But is there a way that Netflix might have learned from all of Youtube's past mistakes?
The only reasonable way to scale something like this up is probably to... scale it up.
Sure, there are probably some generic lessons, but I bet that the pain points in Netflix's architecture (historically grown over more than a decade and optimized towards highly cacheable content) are very different from Youtube, which has ramped up live content gradually over as many years.
2012 live video was what, 480p?
That’s about what Paul Tyson was looking like at times.
No? In 2012 most already had fiber net here. Are you young or poor?
At least in the US "most" most definitely did not have fiber, at best maybe FTTC.
living in a major US city in 2012, we had ancient DSL
The average quality of talent has gone way down compared to 2012 though.
E.g. the median engineer, excluding entry level/interns, at YouTube in 2012 was a literal genius at their niche or quite close to it.
Netflix simply can’t hire literal geniuses with mid six figure compensation packages in 2024 dollars anymore… though that may change with a more severe contraction.
It's incomprehensible to me that Netflix, one of the most highly skilled engineering teams in the world - completely sh*t the bed last night and provided a nearly unwatchable experience that was not even in the same league as pre-internet live broadcast from 30 years ago.
My bet is that a technical manager told his executive (multiple times) that he needed more resources and engineering time to make live work properly, and they just told him to make do because they didn't want to spend the money.
It could come down to something as stupid as:
Executive: "we handled [on demand show ABCD] on day one, that was XX million"
Engineering: "live is really different"
Executive: (arguing about why it shouldn't be that different and should not need a lot of new infrastructure)
Engineering: (can't really argue with his boss about this anymore after having repeated the same conversation 3 or 4 times) -- tells the team: we are not getting new servers or time for a new project. We have to just make do with what we have. You guys are brilliant, I know you can do it!"
hopefully that technical manager has a paper trail and that executive has someone to answer to above them. in cases like this, i always throw together a doc and ask for sign offs.
Hahaha this reads very close to what actually happened
hopefully their stock takes a big hit monday - these types only understand one thing
Are you serious? You think they don't care about this except if the stock 'takes a bit hit'??
Caring about things and tangible consequences are two different things. I don't necessarily agree with the GP comment but one should expect a higher quality retort.
I had buffering issues but then backed off and let a bit of it buffer up (maybe 1 or 2 mintues?) and then it was fine for the entire Tyson Paul match. There was no reason I needed it to be live vs. a 1 or 2 minute delay.
If you had money riding on it because of legalized sports gambling in your state and sports regulators clearing it to be a thing to bet on, then it was probably pretty psychologically important whether or not there was a 1 or 2 minute delay on you seeing the results of the game.
What’s amazing is that they have had several streaming flips before and are unable to fix it
This topic is really just fun for me to read based on where I work and my role.
Live is a lot harder than on demand especially when you can't estimate demand (which I'm sure this was hard to do). People are definitely not understanding that. Then there is that Netflix is well regarded for their engineering not quite to the point of snobbery.
What is actually interesting to me is that they went for an event like this which is very hard to predict as one of their first major forays into live, instead of something that's a lot easier to predict like a baseball game / NFL game.
I have to wonder if part of the NFL allowing Netflix to do the Christmas games was them proving out they could handle live streams at least a month before. The NFL seems to be quite particular (in a good way) about the quality of the delivery of their content so I wouldn't put it past them.
Netflix’s snobbery of engineering is so exhausting. Then seeing them be unable to fix this problem after several previous streaming failures is a bit rich.
To me it speaks to how most of the top tech companies of the 2010s have degraded as of late. I see it all the time with Google hiring some of the lower performing engineers on my teams because they crushed Leetcode.
> The NFL seems to be quite particular (in a good way) about the quality of the delivery of their content
Alas, my experience with the NFL in the UK does not reflect that. DAZN have the rights to stream NFL games here, and there are aspects of their service that are very poor. My major, long-standing issue has been the editing of their full game “ad-free” replays - it is common for chunks of play to be cut out, including touchdowns and field goals. Repeated complaints to DAZN haven’t resulted in any improvements. I can’t help but think that if the NFL was serious about the quality of their offering, they’d be knocking heads together at DAZN to fix this.
I don't think they think this is a problem actually. Content edited replays are actually very popular with sports fans who are time shifting. Time shifting is also an afterthought for the NFL / MLB / NHL from what I can tell. I live in Seattle but grew up in the midwest so time shift a ton of sports and it's always been horrific.
I'm more comparing Thursday Night Football and the quality of the encoding than anything. Delivery glitches are a seperate issue that I think they care about less.
NFL: 90+ minutes after the match on NFL Gameday, and it auto plays the most recent video for that team, which is always the post game interview. So you load it up, go to your team and it auto plays the "we won" or "it was a tough loss" like why the f*ck am I paying for a dvr solution when you do that. NFL Sunday Ticket: you can watch the games sometime monday after the fact but not the sunday night games. Good thing I paid well below half price for it with a disciount.
NHL: constantly shifting networks each year with worse solutions and not letting you get to half the previous games after a week. Totally useless for deferred unless you only want to watch the game a day or more after. Fubo, you have to 'record' the game and sometimes it's on a slightly different network and doesn't record. And their blackout system is the worst of all, who cares about your mediocre local team sorry you can't watch Chefs/Bills because they overlapped by some amount.
MLB: always broken at the top of the year, constantly changing the interface. You often get stuck watching the commercial break which is not actually commercials and is just the same "ohtani / judge highlight video from 2 years ago" and a "stat" about the sluggers that is almost an entire season out of date. The app resets when switching from the live CDN to the on demand one once the game ends which often resets the game and jumps you 6 innings forward, or makes the game unavialable for 30 minutes.
And EFF you if you want to watch playoffs on any of them.
Why is live a lot harder?
Aside from latency (which isn't much of a problem unless you are competing with TV or some other distribution system), it seems easier than on-demand, since you send the same data to everyone and don't need to handle having a potentially huge library in all datacenters (you have to distribute the data, but that's just like having an extra few users per server).
My guess is that the problem was simply that the number of people viewing Netflix at once in the US was much larger than usual and higher than what they could scale too, or alternatively a software bug was triggered.
On demand is easier precisely because having a huge library in all data centers is relatively cheap. In actuality you just have a cache, collocated ISPs that pulls from your origin servers. Likely you have users all watching different things so you can easily avoid hot spots by sharding on the content type. Once the in demand content is in the cache its' relatively easy to serve.
Live content is harder because it can't really be cached, nor, due to TLS, can you really serve everyone the same stream. I think the hardest problem to solve is provisioning. If you are expecting 1 million users, and 700,000 of them get routed to a single server, that server will begin to struggle. This can happen in a couple different ways - for example an ISP who isn't a large consumer normally, suddenly overloads its edge server. Even though your DC can handle the traffic just fine, the links between your DC and the ISP begin to suffer, and since the even is live, it's not like you can just wait until the cache is filled downstream.
... what do you mean it cannot be cached?
isn't it a tree of cache servers? as origin sends the frames they're cached.
and as load grows the tree has to grow too, and when it cannot resorting to degrading bitrate, and ultimately to load shedding to keep the viewers happy?
and it seems Netflix opted to forego the last one to avoid a the bad PR of an error message of "we are over capacity" and instead went with actually let it burn, no?
>... what do you mean it cannot be cached?
When I mean "cached", it means that the PoP server can serve content without contacting the origin server. (The PoP can't serve content it does not have).
>and it seems Netflix opted to forego the last one to avoid a the bad PR of an error message of "we are over capacity" and instead went with actually let it burn, no?
Anything other than 100% uptime is bad PR for Netflix.
Latency is somewhat important for huge sporting events; you don't want every tense moment spoiled by the cheers of your neighbours whose feed is 20 seconds ahead.
With on-demand you can push the episodes out through your entire CDN at your leisure. It doesn't matter if some bottleneck means it takes 2 hours to distribute a 1 hour show worldwide, if you're distributing it the day before. And if you want to test, or find something that needs fixing? You've got plenty of time.
And on-demand viewers can trickle in gradually - so if clients have to contact your DRM servers for a new key every 15 minutes, they won't all be doing it at the same moment.
And if you did have a brief hiccup with your DRM servers - could you rely on the code quality of abandonware Smart TV clients to save you?
That has been a big problem for football, especially things like the Super Bowl.
People using over the air antennas get it “live“. Getting it from cable or a streaming service meant anywhere between a few seconds and over a minute of delay.
It was absolutely common to have a friend text you about something that just happened when you haven’t even seen it yet.
You can’t even say that $some_service is fast, some of them vary over 60 seconds just between their own users.
https://www.phenixrts.com/resource/super-bowl-2024
Latency between the live TV signal for my neighbours and the BBC iPlayer app I was using to watch the Euro 2024 final literally ruined the main moments for me. It still remains an unsolved issue long into the advent of live streaming.
I'm not an expert in this, but at least familiar with the trade.
I'd imagine with on-demand services you already have the full content and therefore can use algorithms to compress frames and perform all kinds of neat tricks to.
With live streaming I'd imagine a lot of these algorithms are useless as there isn't enough delay & time to properly use them, so they're required to stream every single pixel and maybe some JIT algorithms
People are always impressed that Netflix can stand up to a new episode of Squid Game being released. And it’s not easy, we’ve seen HBO failed to handle Game of Thrones for example.
But in either case, you can put that stuff on your CDN days ahead of time. You can choose to preload it in the cache because you know a bunch of people are gonna want it. You also know that not every single individual is going to start at the exact same time.
For live, every single person wants every single bite at the same time and you can’t preload anything. Brutal.
I'm in an adjacent space, so I can imagine some of the difficulties. Basically live streaming is a parallel infrastructure that shares very little with pre-recorded streaming, and there are many failure points.
* Encoding - low latency encoders are quite different than storage encoders. There is a tradeoff to be made in terms of the frequency of key frames vs. overall encoding efficiency. More key frames means that anyone can tune in or recover from a loss more quickly, but it is much less efficient, reducing quality. The encoder and infrastructure should emit transport streams, which are also less efficient but more reliable than container formats like mp4.
* Adaptation - Netflix normally encodes their content as a ladder of various codecs and bitrates. This ensures that people get roughly the maximum quality that their bandwidth will allow without buffering. For a live event, you need the same ladder, and the clients need to switch between rungs invisibly.
* Buffering - for static content, you can easily buffer 30 seconds to a minute of video. This means that small latency or packet loss spikes are handled invisibly at the transport/buffering layer. You can't do this for a live event, since that level of delay would usually be unacceptable for a sporting event. You may only be able to buffer 5-10 seconds. If the stream starts to falter, the client has only a few seconds to detect and shift to a lower rung.
* Transport - Prerecorded media can use a reliable transport like TCP (usually HLS). In contrast, live video would ideally use an unreliable transport like UDP, but with FEC (forward error correction). TCP's reaction to packet loss halves the receive window, which halves bandwidth, which would have to trash the connection to shift to a lower bandwidth rung.
* Serving - pre-recorded media can be synchronized to global DCs. Live events have to be streamed reliably and redundantly to a tree of servers. Those servers need to be load balanced, and the clients must implement exponential backoff or you can have cascading failures.
* Timing - Unlike pre-recorded media, any client that has a slightly fast clock will run out of frames and either need to repeat frames and stretch audio, or suffer glitches. If you resolve this on the server side by stretching the media, you will add complication and your stream will slowly get behind the live event.
* DVR - If you allow the users to pause, rewind, catch up, etc., you now have a parallel pre-recorded infrastructure and the client needs to transition between the two.
* DRM - I have no idea how/if this works on a live stream. It would not be ideal that all clients use the same decryption keys and have the same streams with the same metadata. That would make tracing the source of a pirate stream very difficult. Differentiation/watermarking adds substantial complexity, however.
This Serrano fight is just an insane display of excellence.
If anyone was waiting for the main card to tune in, I recommend tuning in now.
Absolutely excellent fight. 10 full rounds with full effort until the end. Fantastic.
Also, no buffering issues on my end. Have to wonder if it's a regional issue.
What was an amazing fight - that Serrano won. I have no idea how Taylor was scored the winner.
I know nothing about boxing and this fight was just ridiculously impressive. I kept tuning out of the earlier fights. They felt like some sort of filler. I didn’t get the allure. But Taylor v Serrano was just obvious talent that even I could appreciate it.
Serrano was robbed!
[flagged]
Well said. People tuning in late missed out on the real main event. I have a feeling this Tyson fight will be a waste of staying up late and battling Netflix over.
[flagged]
[flagged]
[dead]
[flagged]
[flagged]
[flagged]
[flagged]
[flagged]
[flagged]
[flagged]
[flagged]
That was a savage fight!
naw, taylor head butting the whole fight was dirty and really took the wind out of it
Serrano should have won.
Reading the comments here, I think one thing that's overlooked is that Netflix, which has been on the vanguard of web-tech and has solved many complicated problems in-house, may not have had the culture to internally admit that they needed outside help to tackle this problem.
A combination of hubris and groupthink.
Not invented here syndrome works at first but as time progresses the internally built tools become a liability
What do you think were the dynamics of the engineering team working on this?
I'd think this isn't too crazy to stress test. If you have 300 million users signed up then you're stress test should be 300 million simultaneous streams in HD for 4 hours. I just don't see how Netflix screws this up.
Maybe it was a management incompetence thing? Manager says something like "We only need to support 20 million simultaneous streams" and engineers implement to that spec even if the 20 million number is wildly incorrect.
Has there ever been a 300m concurrent live stream? I thought Disney+ had the record at something like 60m.
There's no way 300 million people watched this, especially if that number is representing every Netflix subscriber. The largest claimed live broadcast across all platforms is last year's Super Bowl with 202 million unique viewers for at least part of it, but that includes CBS, Nickelodeon, and Univision, not just streaming audiences. Its average viewers for the whole game was 123 million, which is second all-time to the Apollo 11 moon landing.
FIFA claimed the 2022 World Cup final reached 1.5 billion people worldwide, but again that seems like it was mostly via broadcast television and cable.
As far as single stream, Disney's Hotstar claimed 59 million for last year's Cricket World Cup, and as far as the YT platform, the Chandrayaan-3 lunar landing hit 8 million.
100 million is a lot of streams, let alone 300. But also note that not every stream reaches a single individual.
And, as far as the 59 million concurrent streams in India, the bitrate was probably very low (I'd wager no more than 720p on average, possibly even 480p in many cases). It's again a very different problem across the board due to regional differences (such as spread of devices, quality of network, even behavioral differences).
480p30 could be as high as 280mbit or as low as 280kbit. Same with other resolutions.
I mean, yes, but nobody streams RAW video in practice, and I can't imagine any users or service providers who'd be happy with that level of inefficiency. In general, it's safe to assume some reasonable compression (which, yes, is likely lossy).
It's quite possible for one broadcasters 480p30 to be a higher bitrate than another broadcasters 720p60
I remember watching the last season of Game of Thrones on one streaming provider, which topped out about 3.5mbit but claimed it was "1080p".
Give me a 15mbit 640x480 over 3.5mbit of 1920x1080 for that type of material any day.
> It's quite possible for one broadcasters 480p30 to be a higher bitrate than another broadcasters 720p60
Yes, I don't think anyone's disputing that.
> I remember watching the last season of Game of Thrones on one streaming provider, which topped out about 3.5mbit but claimed it was "1080p".
Why the scare quotes? That's a perfectly reasonable bitrate using modern compression like H.265, especially for a TV show that's filmed at 24 fps.
World Cup final, if you add up all streams worldwide?
Not through a single system, the advantage of diversity rather than winner-takes-all.
The world cup final itself (and other major events) is distributed from the host broadcaster to either on site at the IBC or at major exchange points.
When I've done major events of that magnitude there's usually a backup scanner and even a tertiary backup. Obviously feeds get sent via all manner - the international feed for example may be handed off at an exchange point, but the reserve is likely available on satelite for people to downlink on. If the scanner goes (fire etc), then at least some camera/sound feeds can be switched direct to these points, on some occasions there's a full backup scanner too.
Short of events that take out the venue itself, I can't think of a plausible scenario which would cause the generation or distribution of the broadcast to break on a global basis.
I don't work for OBS/HBS/etc but I can't imagine they are any worse than other broadcast professionals.
The IT part of this stuff is pretty trivial nowadays, even the complex parts like the 2110 networks in the scanner tend to be commoditised and treated as you'd treat any other single system.
The most technically challenging part is unicast streaming to millions of people at low latency (DASH etc). I wouldn't expect an enormous architectural difference between a system that can broadcast to 10 million or 100 million though.
probably an esports match hosted on bilibili
I love how I can come to HN to instantly find out if it’s Netflix or my WiFi.
Wifi wifi or wifi as in your ISP Internet connection? Sp many people now call an Internet connection "wifi".
Anyway, network cable is the only way to go!
"Many people", those who call their ISP connection WiFi, are technology potato's.
To be fair, a lot of people pay their ISP for a modem/router combo and connect to something like "Xfinity" at their house. So to them, there is no difference.
They may not know technology, but most of those potatoes at least know the difference between a plural and a possessive noun.
I have the wifis with the geebees.
This! I was checking my WiFi and then I instinctively checked HN and what do you know!
metoo!
Right?!
Live streaming is hard. Most companies that do live streaming at 2024 scale did it by learning from their mistakes. This is true for Hotstar, Amazon and even Youtube. Netflix stack is made to stream optimised, compressed , cached videos with a manageable concurrent viewers for the same video. Here we had ~65m concurrent viewers in their first live event. The compression they use, distribution etc have not scaled up well. I'll judge them based on how they handle their next live event
I don't think this is their first live event. They have hosted a pro golf promotional match and they had a live pro tennis match between Nadal and Alcaraz off the top of my head.
When you step back and look at the situation, it's not hard to see why Netflix dropped the ball here. Here's now I see it (not affiliated with Netflix, pure speculation):
- Months ago, the "higher ups" at Netflix struck a deal to stream the fight on Netflix. The exec that signed the deal was probably over the moon because it would get Netflix into a brand new space and bring in large audience numbers. Along the way the individuals were probably told that Netflix doesn't do livestreaming but they ignored it and assumed their talented Engineers could pull it off.
- Once the deal was signed then it became the Engineer's problem. They now had to figure out how to shift their infrastructure to a whole new set of assumptions around live events that you don't really have to think about when streaming static content.
- Engineering probably did their absolute best to pull this off but they had two main disadvantages, first off they don't have any of the institutional knowledge about live streaming and they don't really know how to predict demand for something like this. In the end they probably beefed up livestreaming as much as they could but still didn't go far enough because again, no one there really knows how something like this will pan out.
- Evening started off fine but crap hit the fan later in the show as more people tuned in for the main card. Engineering probably did their best to mitigate this but again, since they don't have the institutional knowledge of live events, they were shooting in the dark hoping their fixes would stick.
Yes Netflix as a whole screwed this one up but I'm tempted to give them more grace than usual here. First off the deal that they struck was probably one they couldn't ignore and as for Engineering, I think those guys did the freaking best they could given their situation and lack of institutional knowledge. This is just a classic case of biting off more than one can chew, even if you're an SV heavyweight.
This isn't Netflix's first foray into livestreaming. They tried a livestream last year for a reunion episode of one of their reality TV shows which encountered similar issues [0]. Netflix already has a contract to livestream a football event on Christmas, so it'll be interesting to see if their engineers are able to get anything done in a little over a month.
These failures reflect very poorly on Netflix leadership. But we all know that leadership is never held accountable for their failures. Whoever is responsible for this should at least come forward and put out an apology while owning up to their mistakes.
[0] https://time.com/6272470/love-is-blind-live-reunion-netflix/
> But we all know that leadership is never held accountable for their failures.
You've never heard of a CEO or other C-suite or VP getting fired?
It most definitely happens. On the other hand, people at every level make mistakes, and it's preferable that they learn from them rather than be fired, if at all possible.
Accountability can take many forms. I don't think they should be fired for making a mistake, I think they should release a statement recognizing their failure along with a post-mortem. Not a particularly high bar, but most leadership failures are often swept under the rug without any public accountability or evidence that they've learned anything.
We have evidence of prior failures with livestreaming from Netflix. Were the same people responsible for that failure or do we have evidence of them having learned anything between events? If anything, I'd expect the best leaders would have a track record that includes failures while showcasing their ability to overcome and learn from those mistakes. But based on what information is publicly available, this doesn't seem to be the case in this situation.
> They now had to figure out how to shift their infrastructure to a whole new set of assumptions around live events
It wasn't their first live event. A previous live event had similar issues.
Livestreaming is a solved problem. This sounds like NIH [1]. (At the very least, hire them as a back-up.)
[1] https://en.wikipedia.org/wiki/Not_invented_here
Saying live-streaming is a solved problem is like saying search is a solved problem.
> Saying live-streaming is a solved problem is like saying search is a solved problem
It is. You can hire the people who have solved it to do it for you.
> It is. You can hire the people who have solved it to do it for you.
"GPGPU compute is a solved problem if you buy Nvidia hardware" type comment
> "GPGPU compute is a solved problem if you buy Nvidia hardware" type comment
You're replacing the word hire with buy. That misconstrues the comment. If you need to do GPGPU compute and have never done it, you work with a team that has. (And if you want to build it in house, you scale to it.)
>"GPGPU compute is a solved problem if you buy Nvidia hardware"
Which is valid? If your problem can be solved by writing a check, then it's the easiest problem to have on the planet.
Netflix didn't have to put out 3 PhD dissertations on how to improve the SOTA of live streaming, they only needed to reliably broadcast a fight for a couple hours.
That is a solved problem.
Amazon and Cloudflare do that for you as a service(!). Twitch and YouTube do it literally every day. Even X started doing it recently so.
No excuses for Netflix, tbh.
Landing on Mars is a solved problem. Nuclear bombs are a solved problem. Doesn't mean anyone can just write a check and get it done and definitely doesn't mean any business model can bear that cost.
Of course it means that!
You only need a big enough check.
No it doesn't.
India has landed on Mars for a fraction of the cost it took other nations, and the ESA has never been able to pull it off.
Not every cost is fungible and money isn't always the limiting factor.
It should be obvious that not all risks can be converted into a capital problem.
People say this, but then fall in love, get divorced, get depressed, or their company might lose its mojo, get sued, or lose an unreplaceable employee. But they will still say “all risk can be costed.”
>India has landed on Mars for a fraction of the cost it took other nations, and the ESA has never been able to pull it off.
This further confirms my assertion, btw.
If it is just a matter of paying up, why hasn't ESA pulled it off? I'm pointing out and offering examples that "solved problem" has no bearing on the ease or organization capacity of any one group to do it. It is merely a statement that no unknown, new solution needs be invented.
If I have to spell it out you're clearly debating in bad faith and we're done here.
We are arguing if it's possible or not.
Who cares if a thousand guys are incapable? (like Netflix, lmao)
What matters are the ones that can do it, and you even said they've done it at "a fraction of the cost".
Paraphrasing, your argument says more about the incompetence of the ESA than the impossibility of doing such thing.
> Not every cost is fungible and money isn't always the limiting factor
Sure. This isn’t relevant to Netflix.
It is. The fact that 'streaming is a solved problem' has no bearing on any one company's ability to do it at scale. Solved problem means merely you don't have to invent something new, not that it is easy or within reach of everyone.
"Solved" merely means you don't need to invent something new to solve it. It doesn't mean trivial nor easy. And it definitely doesn't mean the problem is above trade-offs.
Look. I'm a small startup employee. I have a teeny tiny perspective here. But frankly speaking the idea that Netflix could just take some off the shelf widget and stuff it in their network to solve a problem... It's an absurd statement for even me. And if there's anyone it should apply to it would be a little startup company that needs to focus on their core area.
Every off the shelf component on the market needs institutional knowledge to implement, operate, and maintain it. Even Apple's "it just works" mantra is pretty laughable in the cold light of day. Very rarely in my experience do you ever get to just benefit from someone else's hard work in production without having an idea how properly implement, operate, and maintain it.
And that's at my little tiny ant scale. To call the problem of streaming "solved" for Netflix... Given the guess of the context from the GP post?
I just don't think this perspective is realistic at all.
> the idea that Netflix could just take some off the shelf widget and stuff it in their network to solve a problem
Right. They have to hire one of the companies that does this. Each of YouTube, Twitch (Amazon), Facebook and TikTok have, I believe, handled 10+ million streams. The last two don't compete with Netflix.
I believe this is the spirit of the "solved problem" comment: not that the solution is an off-the-shelf widget, but that if it has ever been solved, then that solution could technically be used again, even if organizing the right people is exorbitantly expensive.
Offering it for sale != having solved it.
We now know it was more than 60m streams. I think it’s either a record or approaching one for a live stream.
There are multiple companies that offer this capability today that would take a few weeks to hide behind company branding. This was a problem of netflix just not being set up for live stream but thinking they could handle it.
At 120m concurrents? I’d be interested who can whitelabel that.
>First off the deal that they struck was probably one they couldn't ignore
If you can't provide the service you shouldn't sell it?
My speculation here is this was just classic SV cockiness. The team that closed this deal probably knew that they didn't have the capability but I'm sure the arguments for doing it anyways was something along the lines of: "we have the best engineers in the bay area, we can probably figure this out"
There are endless amounts of stories and situations in which selling something before it really exists has helped businesses. It's totally plausible that a team working on video streaming at the scale of Netflix could figure out live streaming.
Pre-optimization is definitely a thing and it can massively hurt (i.e. startups go under) businesses. Let's stop pretending any businesses would say 'no' to extra revenue even before the engineering team had full assurance there was no latency drop.
In my language we have a saying that roughly translates to "Don't sell the hide until you've shot the bear".
And sure, there have probably been lots of examples where a business made promises they weren't confident about and succeeded. But there are surely also lots of examples where they didn't succeed.
So what's the moral of the story? I don't know, maybe if you take a gamble you should be prepared to lose that gamble. Sounds to me like Netflix fucked up. They sold something they couldn't provide. What are the consequences of that? I don't know, not do I particularly care. Not my problem.
Execs never listen or even ask engineers about feasibility of projects they sign up to. Hope the exec in question will be let go.
I mean, the ones that do ask don’t proceed to signing up. I think we are seeing a form of survival bias.
You've never worked in a startup have you? Or any business for that matter. You have to promise something first, then build it.
No joke, is this actually true?
Do startups really do this? I thought the capability is built or nearly built or at least in testing already with reasonable or amazing results, THEN they go to market?
Do startups go to other startups, fortune 500 companies and public companies to make false promises with or without due diligence and sign deals with the knowledge that the team and engineers know the product doesn't have the feature in place at all?
In other words:
Company A: "We can provide web scale live streaming service around the world to 10 billion humans across the planet, even the bots will be watching."
Company B: "OK, sounds good, Yes, here is a $2B contract."
Company A: "Now team I know we don't have the capability, but how do we build, test and ship this in under 6 months???"
Startups absolutely sell things they haven't made yet and might not even be capable of doing.
Next thing you know it's 9pm on a Sunday night and your desperately trying to ship a build for a client.
Netflix isn't some scrappy company though. If I had to guess they threw money at the problem.
A much better approach would of been to slowly scale over the course of a year. Maybe stream some college basketball games first, slowly picking more popular events to get some real prod experience.
Instead this is like their 3rd or 4th live stream ever. Even a pre show a week before would of allowed for greater testing.
I'm not a CTO of a billion dollar company though. I'm just an IC who's seen a few sites go down underload.
To be fair no one knows how it's going to go before it happens. It would of been more surprising for them to pull this off without issues... It's a matter of managing those issues. I know if I had paid 30$ for a Netflix subscription to watch this specific event I'd assume I got ripped off.
You don't necessarily have to make false promises.
You can be totally honest and upfront that the functionality doesn't exist yet and needs to be built first, but that you think you understand the problem space and can handle the engineering, provided you can secure the necessary funding, where, by the way, getting a contract and some nominal revenue now could greatly help make this a reality...
And if the upside sounds convincing enough, a potential customer might happily sign up to cover part of your costs so they can be beta testers and observe and influence ongoing development.
Of course it happens all the time that the problem space turns out to be more difficult than expected, in which case they might terminate the partnership early and then the whole thing collapses from lack of funding.
If anything, startups are more transparent about it.
In the enterprise sector this is rampant. Companies sell "platforms" and those missing features are supposed to be implemented by consultants after the sale. This means the buyer is the one footing the bill for the time spent, and suffering with the delays.
“Aspirational sugar” is as common in startup culture as in Fortune 500 sales contracts, they’re just messaged and “de-risked” differently.
Many do, as far as initial investment goes. It makes sense when you think about the capital intensive nature of most startups (including more than web startups here, e.g. lab tech commercialization). It also accurately describes a research grant.
That's for startups that can't bootstrap (most of them). For ones which can, they may still choose to do this with customers, as you describe, because it means letting their work follow the money.
Wait until you figure out how fortune 500 corporate vendors usually engage in the game of RFP box checking.
i imagine this is why a lot of products, and startups, fail.
>> First off the deal that they struck was probably one they couldn't ignore
> If you can't provide the service you shouldn't sell it?
Then how will the folks in Sales get their commission?
Besides, not providing the service hasn't stopped Tesla from selling FSD, and their stock has been going gangbusters.
/s
Not sure why Netflix is held in high regard - this proves they're just as much clowns as the other 'big players' in the circus.
They arent clowns at all. Ita a totally different engineering problem and you cant just spin up live streaming capacity on demand. The entire system end to end isnt optimized for live streams yet.
I mean, maybe? You just made all this up.
[flagged]
[flagged]
That’s uncharitable. Proposing reasons for institutional failure and discussing those can be ways for humans to improve communication and said challenges.
It's a way to mislead people into misunderstanding reality and therefore solving the wrong problems, often causing harm, now and in the future.
That's why serious analysis requires a factual basis, such as science, law, and good engineering and management. You need analytics data to figure out where the performance and organizational bottlenecks are.
Before people tried to understand illness with a factual basis, they wrote speculative essays on leeching and finding 'better' ways to do it.
Main event hasn’t even started yet. Traffic will probably 10x for that. They’re screwed. Should have picked something lower profile to get started with live streaming.
I don’t work in tech. Is this something that engineers could respond to and reallocate resources to fix mid stream?
Not a chance. This level of infrastructure was set up days in advance - I would be unsurprised if they'd had a code freeze all week for this fight.
Had issues all stream but was perfect during the final fight.
[dead]
They've done quite a bit of lower profile live streams... various events, and the Everybody's in LA chat show series.
It’s insane the excuses being made here for Netflix’s apparently unique circumstances.
They failed. Full stop. There is no valid technical reason they couldn’t have had a smooth experience. There are numerous people with experience building these systems they could have hired and listened to. It isn’t a novel problem.
Here are the other companies that are peers that livestream just fine, ignoring traditional broadcasters:
- Google (YouTube live), millions of concurrent viewers
- Amazon (Thursday Night Football, Twitch), millions of concurrent viewers
- Apple (MLS)
NBC live streamed the Olympics in the US for tens of millions.
I don't disagree that Netflix could have / should have done better. But everybody screws these things up. Even broadcast TV screws these things up.
Live events are difficult.
I'll also add on, that the other things you've listed are generally multiple simultaneous events; when 100M people are watching the same thing at the same time, they all need a lot more bitrate at the same time when there's a smoke effect as Tyson is walking into the ring; so it gets mushy for everyone. IMHO, someone on the event production staff should have an eye for what effects won't compress well and try to steer away from those, but that might not be realistic.
I did get an audio dropout at that point that didn't self correct, which is definitely a should have done better.
I also had a couple of frames of block color content here and there in the penultimate bout. I've seen this kind of stuff on lots of hockey broadcasts (streams or ota), and I wish it wouldn't happen... I didn't notice anything like that in the main event though.
Experience would likely be worse if there were significant bandwidth constraints between Netflix and your player, of course. I'd love to see a report from Netflix about what they noticed / what they did to try to avoid those, but there's a lot outside Netflix's control there.
As a cofounder of a CDN company that pushed a lot of traffic, the problem with live streaming is that you need to propagate peak viewership trough a loooot of different providers. The peering/connectivity deals are usually not structured for peak capacity that is many times over the normal 95th percentile. You can provision more connectivity, but you don't know how many will want to see the event. Also, live events can be trickier than stored files, because you can't offload to the edges beforehand to warm up the caches.
So Netflix had 2 factors outside of their control
- unknown viewership
- unknown peak capacities outside their own networks
Both are solvable, but if you serve "saved" content you optimize for different use case than live streaming.
The examples given here are not on the same scale. The numbers known so far:
- 120m viewers [1]
- Entire Netflix CDN Traffic grew 4x when the live stream started [2]
[1] https://www.rollingstone.com/culture/culture-news/jake-paul-...
[2] https://x.com/DougMadory/status/1857634875257294866
I guess one question I have is did Netflix partner with other CDNs?
Despite their already huge presence, Amazon for example has multiple CDNs involved for capacity for live events. Same for Peacock.
Disney HotStar managed to stream ~60M livestreams for the Cricket world cup a year ago. The problem has been solved. Livestreaming sports just have a different QoS expectations than on demand.
https://www.geeksforgeeks.org/how-did-hotstar-managed-5-9-cr...
I wouldn't say it's a solved problem, how many other companies are pulling off those numbers? Isn't that the current record for concurrent streams? And wasn't it mostly to mobile devices?
Size of Hotstar team = ~2000. Enggrs will be less than that number.
https://www.linkedin.com/company/disney-hotstar/people/
The size of engineering head count is not informative, it really depends on how much is in-house and how much is external for Hotstar that would be i.e parent Disney or before Fox or staffing from IT consulting organizations who will not be on payroll.
For what it is worth, all things being equal there would be lot more non engineering in Hotstar for 2000 employees versus a streaming company of similar size or scale of users. Hotstar operates in challenging and fragmented market, India has 10+ major languages(and corresponding TV, music and movie markets) Technically there is not much difference to what Netflix or Disney has to do for i18n, however operationally each market needs separate sales, distribution and operations.
---
P.S. Yes Netflix operates in more markets including India than anybody else, however if you are actually using Netflix for almost any non English content, you will know how weak their library and depth in other markets are, their usual model in most of these markets is to have few big high quality(for that market) content rather than build depth.
P.P.S. Also yes, Indian market is seeing consolidation in the sense that many releases on streaming are multiple lingual and use major stars from more than one language to draw talent ( not new, but growing in popularity as distribution becomes cheaper with streaming), however this is only seen in big banner productions as tastes are quite different in each market and can't scale for all run of the mill content.
Disney Streaming has 900 employees, a large majority of which are engineers.
This is the company that supplies technology to Hotstar, Hulu, MLB Live streaming, etc.
https://en.m.wikipedia.org/wiki/Disney_Streaming
Hotstar is a completely different company.
It’s possible I will never read a worse written article so completely devoid of any actual information. I wonder why they bothered writing it
Netflix has 280m subscriber highly doubt half of them tuned in to watch the match, is that 130m figure official?
Amazon had their fair share of livestream failures and for notably less viewers. I don't think they deserve a spot on that list. I briefly worked in streaming media for sports and while it's not a novel problem, there are so many moving parts and points of failure that it can easily all go badly.
There is no one "Amazon" here, there are at least 3:
* Twitch: Essentially invented live streaming. Fantastic.
* Amazon Interactive Video Service [0]: Essentially "Twitch As A Service", built by Twitch engineers. Fantastic.
* Prime Video. Same exact situation as Netflix: original expertise is all in static content. Lots of growing pains with live video and poor reports. But they've figured it out: now there are regular live streams (NHL and NFL), and other channel providers do live streaming on Prime Video as a distribution platform.
[0] https://aws.amazon.com/ivs/
Doesn't twitch almost fall over (other non-massive streams impacted) when anyone gets close to 4-5m concurrent viewers? I remember last time it happened everything started falling over, even for smaller streams. Even if Netflix struggled with the event, streaming other content worked just fine for me.
IVS does not scale past 1080p: https://ivs.rocks/
4K is overrated for streaming. Most people's connections can't handle it so it always ends up getting downscaled.
pisses me off.
physical media for the win, tho.
> They failed. Full stop.
It's not full stop. There are reasons why they failed, and for many it's useful and entertaining to dissect them. This is not "making excuses" and does not get in the way of you, apparently, prioritizing making a moral judgment.
The big difference of all the examples you’ve mentioned is dedicated full-time crews on the ground where the events are produced.
I’m pretty confident that when the post mortem is done the issues are going to be way closer to the broadcast truck than the user.
it could be that they made use of the same advice X followed :)
It will never not annoy and amuse me that illegal options (presumably run by randoms in their spare time) are so much better than the offerings of big companies and their tech ‘talent’.
Illegal options would have lot less active users. So it is not a fair comparison
Illegal options also have lot less resources (revenue, service providers who are willing host/facilitate illegal activities, and so on) so it’s a fair comparison in my opinion.
> service providers who are willing host/facilitate illegal activities
At least for NFL pirate streams, it seems they tend to use "burner" tenants from Azure and AWS. Of course they get shut down, but how hard is it to spin up another one?
They still have to put it behind a privacy-friendly proxy to hide their IP address from litigators right?
Yup. Also a bit more latency since it's effectively restreaming unless it's someone at the actual event.
I have Netflix purchased legally with hard earned money. But because I had issues I looked for illegal streams, and they were bad, crashes, buffering.. you name it. So I went back to Netflix and watched it at 140p quality.
This twitter stream was the most reliable for me. Completely took Netflix out of the equation; just some dude at the event with his phone: https://x.com/i/broadcasts/1mrxmMRmXpQxy
> just some dude at the event with his phone
Antonio Brown is not “just some dude”. He’s a national treasure.
Its a good thing he's rich and famous otherwise there might actually be consequences for him illegally broadcasting this.
If he's rich then he's financially worth employing an IP lawyer to pursue for copyright infringement, if it was infringement. Maybe he was licensed?
Utter incompetence from senior leadership at Netflix. They had so much time to prepare for this.
I want to index everyone sneering at this situation and never work with any of them.
Eh, punching up, while still punching, doesn’t seem that distasteful to me.
There's no up. There's just punching, and making excuses for punching.
yep, especially knowing this isn't their first rodeo... 18 months since https://time.com/6272470/love-is-blind-live-reunion-netflix/
> But the real indicator of how much Sunday’s screw-up ends up hurting Netflix will be the success or failure of its next live program—and the next one, and the one after that, and so on. There’s no longer any room for error. Because, like the newly minted spouses of Love Is Blind, a streaming service can never stop working to justify its subscribers’ love. Now, Netflix has a lot of broken trust to rebuild.
Weird that an organization like Netflix is having problems with this considering their depth of both experience and pockets. I wonder if they didn't expect the number of people who were interested in finding out what the pay-per-view experience is like without spending any extra money. Still, I suppose we can all be thankful Netflix is getting to cut their live event teeth on "alleged rapist vs convicted rapist" instead of something more important.
> alleged rapist vs convicted rapist
And you’ll never guess which Presidential candidate they both support!
From my experience, it works if your not watching it 'live'. But the moment I put my devices to 'live' it perma-breaks. 504 gateway timed out in web developer tools hitting my local CDN. probably works on some CDNs, doesnt on others. Probably works if your not 'live'
edit: literally a nginx gateway timed out screen if you view the response from the cdn... wow
It's down permanently for me in India. We have Hotstar, which has a record of 58 million viewers during the cricket World Cup final. Way ahead.
Probably less about the level of advancement and more about their ability to stream vs play VOD. Two different kinds of infrastructure optimisation.
59 million at the world cup was concurrent live streams
> https://www.icc-cricket.com/news/biggest-cricket-world-cup-e...
An impressive achievement and the scale netflix failed to do a year later.
Wasn't that the biggest concurrent stream ever?
This is probably a naive question but very relevant to what we have here.
In a protocol where a oft-repeated request goes through multiple intermediaries, usually every intermediate will be able to cache the response for common queries (Eg: DNS).
In theory, ISPs would be able to do the same with the HTTP. Although I am not aware of anyone doing such (since it will rightfully raise concerns of privacy and tampering).
Now TLS (or other encryption) will break this abstraction. Every user, even if they request a live stream, receives a differently encrypted response.
But live stream of a popular boxing match has nothing to do with the "confidentiality" of encryption protocol, only integrity.
Do we have a protocol which allows downstream intermediates eg ISPs to cache content of the stream based on demand, while a digital signature / other attestation being still cryptographically verified by the client?
there's Named Data Networking (went by Content-Centric Networking earlier). You request data, not a url, the pipe/path becomes the CDN. If any of your nearest routers have the bytes, your request will go no further.
I don't see it much mentioned the last few years, but the research groups have ongoing publications. There's an old 2006 Van Jacobson video that is a nice intro.
What you describe is called a CDN and has been widely used for 20 years.
Can Mike Judge please stop predicting everything?
I've been re-watching Silicon Valley the last few weeks and just watched the Nucleus live stream episode 2 days ago, pretty funny seeing it in real life.
"Puts data compression in a rear naked chokehold"
Reference for everyone else: https://m.youtube.com/watch?v=9IGvzb-KCpY
I guarantee this is a management issue. Somebody needed to bear down at some point and put the resources into load testing. The engineers told them it probably won't be sufficient.
I assume this came down to some technical manager saying they didn't have the human and server resources for the project to work smoothly and a VP or something saying "well, just do the best you can.. surely it will be at least a little better than last time we tried something live, right?"
I think there should be a $20 million class action lawsuit, which should be settled as automatic refunds for everyone who streamed the fight. And two executives should get fired.
At least.. that's how it would be if there was any justice in the world. But we now know there isn't -- as evidenced by the fact that Jake Paul's head is still firmly attached to his body.
I am curious about their live streaming infrastructure.
I have done live streaming for around 100k concurrent users. I didn't setup infrastructure because it was CloudFront CDN.
Why it is hard for Netflix. They have already figured out CDN part. So it should not be a problem even if it is 1M or 100M. because their CDN infrastructure is already handling the load.
I have only work with HLS live streaming where playlist is constantly changing compared to VOD. Live video chunks work same as VOD. CloudFront also has a feature request collapsing that greatly help live streaming.
So, my question is if Netflix has already figured out CDN, why their live infrastructure failing?
Note: I am not saying my 100k is same scaling as their 100M. I am curious about which part is the bottleneck.
> Why it is hard for Netflix. They have already figured out CDN part. So it should not be a problem even if it is 1M or 100M. because their CDN infrastructure is already handling the load ... Note: I am not saying my 100k is same scaling as their 100M. I am curious about which part is the bottleneck.
100k concurrents is a completely different game compared to 10 million or 100 million. 100k concurrents might translate to 200Gbps globally for 1080p, whereas for that same quality, you might be talking 20T for 10 million streams. 100k concurrents is also a size such that you could theoretically handle it on a small single-digit number of servers, if not for latency.
> CloudFront also has a feature request collapsing that greatly help live streaming.
I don't know how much request coalescing Netflix does in practice (or how good their implementation is). They haven't needed it historically, since for SVOD, they could rely on cache preplacement off-peak. But for live, you essentially need a pull-through cache for the sake of origin offload. If you're not careful, your origin can be quickly overwhelmed. Or your backbone if you've historically relied too heavily on your caches' effectiveness, or likewise your peering for that same reason.
200Gbps is a small enough volume that you don't really need to provision for that explicitly; 20Tbps or 200Tbps may need months if not years of lead time to land the physical hardware augments, sign additional contracts for space and power, work with partners, etc.
Live streaming and streaming prerecorded movies is a whole different ballgame.
In fact, optimizing for later can hurt the former.
Would be interesting to read any postmortems on this failure. Maybe someone will be kind enough to share the technical details for the curious crowd.
Amazon had issues last year too when they started broadcasting TNF but its fine these days.
I'm sure they will get it figured out.
I thought Netflix engineers were the best and could even do mythical leetcode hards. What happened? Why are they paid half a million dollars a year?
Isn't this more of a management problem, trying to turn a not-livestream system into a livestreaming one?
> envoy overloaded
That's the plain-text message I see when I tried to refresh the stream.
Follow-up:
My location: East SF Bay.
Now even the Netflix frontpage (post login, https://www.netflix.com/browse ) shows the same message.
The same message even in a private window when trying to visit https://www.netflix.com/browse
The first round of the fight just finished, and the issues seem to be resolved, hopefully for good. All this to say what others have noted already, this experience does not evoke a lot of confidence in Netflix's live-streaming infrastructure.
Ah, envoy. Now that is a name I have not missed.
Reminds me of Nucleus stuttering during UFC
I could hear Gavin Belson screaming during the broadcast when my stream was freezing as they were each making their entrance. Mike Judge is a prophet.
That show ages better every single day.
Hell, I’d complaing about Jake Paul vs Mike Tyson as well if I was a boxing fan. Even without buffering issues
Yes, it was utterly boring, but they made their money. I don't like either Paul brother, so I only watched in hopes shorter, much-older Tyson would make Jake look as foolish as he is.
They better get have some better judges and refs too. The co-headline title fight was a joke.
Netflix has some NFL games on Christmas Day. Wonder how those will go for them.
I remember when ESPN started streaming years back, it was awful. Now I almost never have problems with their live events, primarily their NHL streams.
A friend and I, in separate states, found that it wouldn’t stream from TVs, Roku, etc. but would stream from mobile. And for me, using a mobile hotspot to a laptop; though that implies checking IP address range instead of just user-agent, so that seems unlikely.
Anyway, I wouldn’t be surprised if they were prioritizing mobile traffic because it’s more forgiving of shitty bitrate.
I wonder if this points to network peering and edge nodes. Mobile network vs cabled network likely being routed to different places.
I just left a bar streaming it on a smart TV and back in my home it's streaming on the Roku just fine.
Guess I was looking for explanations too hard.
FWIW, works fine for me.
Please don't make these types of comments, they mean nothing and they serve no purpose.
It means, it is different if the service goes down to 100, 50, 10 percent of users. I watched the show with no issues.
Comments on forums do not provide that data. And if you want to extrapolate self-reports, it's obviously fine (to varying degrees) for the vast majority of people, but that's not the "issue."
These kind of reports are the equivalent of saying "I have power" when you're hundreds of miles away from where a hurricane landed. It's uninteresting, it's likely you have power, and it does literally nothing for people that do not have power.
It doesn't advance the topic anywhere. There are other places to report these issues directly and in aggregate with other people -- HN (and many other forums) are not that place.
You’re in a fucking thread about people commenting in a forum about an outage that may or may not have been caused by Netflix
Which has many interesting facets worthy of discussion! No need to be extremely aggressive in your tone.
You are the one telling people their comments provide no value. That’s far more aggressive than using a word not even directed at you
How could that possibly be aggressive? It's not a personal indictment. Tone however, matters in any context.
I hope you find it within yourself to treat strangers nicer.
> How could that possibly be aggressive?
You butted into a conversation to tell someone their contribution added no value without adding anything constructive. A comment of “your comment is useless” is pure aggression and is ironically even less useful than the one it’s deriding.
> Tone however, matters in any context.
You are getting upset because someone used a swear word. You’ll find that is just deep seated classism and working on that will let you have much more fulfilling interactions.
Tone policing never works. It’s a waste of calories and everyone’s time.
Why is it acceptable to share that it doesn't work but not acceptable to share that it does?
For the same reason that pointing out the sun rose in the east today would be ridiculous but if it happened to rise in the west, or you perceived it to rise in the west, that would be worth sharing.
Being able to livestream a sporting event is the default now and has been for at least over a decade since HBO’s original effort to stream a Game of Thrones season opener failed because of the MSFT guy they hired, and they fixed it by handing it over to MLBAM.
Maybe that’s what Netflix should do. Buy Disney so they can get access to the MLBAM people (now Disney Streaming because they bought it from MLB).
Been working great for me as well. Starlink in Oregon.
The stream never buffered on my side but quality was for the whole duration of the stream pretty basic I doubt it was even 720p
Us too
It probably depends more on the ISP than on Netflix. Engineers over in my ISP’s subreddit are talking about how flows from Netflix jumped by over 450Gb/s and it was no big deal because it wasn’t enough to cause any congestion on any of their backbone links.
On a tangential note, the match totally looked fixed to me - Tyson was barely throwing any punches. I understand age is not on his side, but he looked plenty spry when he was ducking, weaving and dodging. It seemed to me he could have done better in terms of attacking as well.
I would argue Tyson has a shorter reach, Jake was whiffing a lot of superman punches, and all that does is waste energy. Jake might be able to throw punches, but he clearly wasn't interested in taking them. If they stood closer and slugged it out, the fight could have gone either way.
Yeah the biggest thing to me, and the commentators mentioned this as well, his legs looked REALLY wobbly.
All your attacking power comes from your legs and hips, so if his legs weren’t stable he didn’t have much attacking power.
I think he gave it everything he had in rounds 1 and 2. Unfortunately, I just don’t think it was ever going to be enough against a moderately trained 27 year old.
Bet they wish they'd gone with middle out compression
When they come up with that idea it's the most 18-rated and accurate way an engineer would think about it.
I wrote an analysis on doing this kind of unicast streaming in cable networks a decade ago. For edge networks with reasonable 100gig distribution as their standard, these would see some of the minor buffering issues.
There is a reason that cable doesn’t stream unicast and uses multicast and QAM on a wire. We’ve just about hit the point where this kind of scale unicast streaming is feasible for a live event without introducing a lot of latency. Some edge networks (especially without local cache nodes) just simply would not have enough capacity, whether in the core or peering edge, to do the trick.
Saw an Arista presentation about the increase in SFP capacity, it's Moore law style stuff. Arm based kit has a shockingly efficient amount of streams-per-watt too.
I can't see traditional DVB/ATSC surviging much beyond 2040 even accounting for the long tail.
You're right that large scale parallel live streams has only become feasible in the last few years. The BBC has some insights in how the BBC had to change their approach to scale to getting 10 million in 2021, having had technical issues in the 3 million range in 2018
https://www.bbc.co.uk/webarchive/https%3A%2F%2Fwww.bbc.co.uk...
Personally I don't think the latency is solved yet -- TV is slow enough (about 10 seconds from camera to TV), but IP streaming tends to add another 20-40 seconds on top of that.
That's no good when you're watching the penalties. Not only will your neighbours be cheering before you as they watch on normal TV, but even if you're both on the same IPTV you may well 5 seconds of difference.
The total end-to-end time is important too, with 30 seconds the news push notifications, tweets, etc on your phone will come in before you see the result.
> Saw an Arista presentation about the increase in SFP capacity, it's Moore law style stuff.
SFP itself isnt much the issue. Serdes is, and then secondarily the operating power envelope for the things (especially for the kinds of optics that run hot). Many tradeoffs available.
>I can't see traditional DVB/ATSC surviging much beyond 2040 even accounting for the long tail.
Tend to agree in well-developed infra, but rural and poorly-developed are well served with more traditional broadcast. Just saying “starlink!” 5 times in a dark bathroom won’t fix that part.
> Personally I don't think the latency is solved yet -- TV is slow enough (about 10 seconds from camera to TV), but IP streaming tends to add another 20-40 seconds on top of that.
I dont think it will get better. Probably worse, but with net better service quality. HLS/DASH are designed for doing the bursty networking thing. Among good reasons for this, mobile works much better in bursts than strict linear streams, segment caching is highly effective, etc.
But I think this makes sense: its a server-side buffering thing that has to happen. So assuming transmuxing (no transcoding lol) and wire latency are 0, we’re hitting the 1-5 seconds for the segment, probably waiting for a fill of 10 seconds to produce the manifest, then buffering client-side another 10 or so. Throw in more cache boxes and it’ll tick up more. It is quite high, but aside from bookies, i dont know how much people will actually care vs complain.
My kid woke me up complaining internet is not working. Turns out he is trying to watch the fight and it's not working at all here in India.
I think they must be noticing the issues, because I've noticed they've been dropping the stream quality quite substantially... It's a clever trick, but kind of cheap to do so, because who wants to watch pixelated things?
To be brutally honest if it’s a choice between pixelated and constantly buffering, pixelated is way less bad. Constantly buffering is incredibly annoying during live sports. (but this doesn’t negate your main point which is that if people paid to watch they expect decent resolution)
Looks like I’m playing Tysons Punchout right now
Glass Jake?
Dumb question
Isn't live streaming at scale already solved problem by cable companies? I never seen ESPN going down during a critical event
Yes, as I have said again and again on hacker news in different comments Netflix went overboard with their microservices and tried to position itself as a technological company when it's not. It has made everything more complex and that's why any Netflix tech blog is useless because it is not the way to build things correctly.
To understand how to do things correctly look at something like pornhub who handle more scale than Netflix without crying about it.
The other day I was having this discussion with somebody who was saying distributed counter logic is hard and I was telling them that you don't even need it if Netflix didn't go completely mental on the microservices and complexity.
This is not the same streaming - netflix is doing that over HTTP. Totally different tech and scaling issues
Yes and no. It's not the "same" but this is a solved problem. Fastly regularly delivers the Super Bowl to 10x as many viewers.
Netflix dropped the ball hard
Fastly says they do 6M ccv for superbowl (i'm actually surprised they let them do the entire thing and don't mix different CDNs) and I'm not sure they do encoding and manifest serving - they might just cache/deliver chunks. Do you really think tyson vs other guy was only 600k ccv? I'd be shocked if Netflix can't handle this.
You would think, but technology always finds a way to screw things up. Cox Communications has had ongoing issues with their video for weeks because of Juniper router upgrades and even the vendor can't fix it. They found this out AFTER they put it in production. Shit happens.
I don't understand why the media is pushing this a Jake Paul vs Mike Tyson stuff so hard and why people care about it. Boxing is crude entertainment for low intelligence people.
I'm tired of all this junk entertainment which only serves to give people second-hand emotions that they can't feel for themselves in real life. It's like, some people can't get sex so they watch porn. People can't fight so they watch boxing. People can't win in real life so they play video games or watch superhero movies.
Many people these days have to live vicariously through random people/entities; watch others live the life they wished they had and then they idolize these people who get to have everything... As if these people were an intimate projection of themselves... When, in fact, they couldn't be more different. It's like rooting for your opponent and thinking you're on the same team; when, in fact, they don't even know that you exist and they couldn't be more different from you.
You're no Marvel superhero no matter how many comic books you own. The heroes you follow have nothing to do with you. Choose different heroes who are more like you. Or better; do something about your life and give yourself a reason to idolize yourself.
Does anyone have any thoughts besides "bad engineering" on what could've gone wrong? It seems like taking on a new endeavor like streaming an event that would possibly draw many hundreds of millions of viewers doesn't make sense. Is there any obvious way that this would just work, or is there obviously a huge mistake deeply rooted in the whole thing. Also, are there any educated guesses on some fine details in the codebase and patterns that could result in this?
How is this story not on the front page anymore? 375 comments. Seems like a big story to me.
I believe HN's algorithm tends to relatively downrank stories with a high comment-to-upvote ratio, because they are more often flamewars on divisive topics.
Another major algorithmic down-ranker is vote wars on comments.
If lots of people are upvoting and downvoting the same comments, that's treated as a signal the topic is contentious and people are likely to start behaving badly.
HN is very clear they prioritize good behavior as the long term goal, and they are as a result comfortable having contentious topics fall in the ranking even if everyone involved in the discussion feels the topic is important.
Mine is glitchy, but if I refresh i get a good steam for a bit, then it gets low res, then freeze. If I wait for auto-reconnect it takes forever. Hard refresh and I'm good. Like, new streams to new server, then overloaded, then does as if their cluster is crashing and healing is rapid cycles. Sawtooth patterns on their charts.
And then all these sessions lag, or orphan taking up space, so many reconnections at various points in the stream.
System getting hammered. Can't wait for this writeup.
The arrogant Netflix! They always brag about how technologically superior they are, and they can't handle a simple technological challenge! I didn't have a buffering issue, I had an error page - for hours! Yet, they kept advertising the boxing match to me! What a joke! If you can't stream it, don't advertise it to save face with people like me who don't care about boxing!
Every organization makes mistakes and every organization has outages. Netflix is not different. Instead, of bashing them because they are imperfect, you might want to ask what you can learn from this incident. What would you do if your service received more traffic than expected? How would you test your service so you can be confident it will stay up?
Also, I have never seen any Netflix employees who are arrogant or who think they are superior to other people. What I have seen is Netflix's engineering organization frequently describes the technical challenges they face and discusses how they solve them.
I think you’re oversimplifying it. Live event streaming is very different from movie streaming. All those edge cache servers become kinda useless and you start hitting peering bottlenecks.
Edge caches are not useless for live streaming. They're critical. The upstream from those caches has no way of handling each individual users. The stream needs to hit the edge cache and end users should be served from there.
A typical streaming architecture is multi-tiered caches, source->midtier->edge.
We don't know what happened but it's possible they ran out of capacity on their edge (or anywhere else).
BBC had a similar issue in a live stream 5 years ago where events conspired and a CDN "failed open", which effectively DOSsed the entire output via all CDNs
> Even though widely used, this pattern has some significant drawbacks, the best illustration being the major incident that hit the BBC during the 2018 World Cup quarter-final. Our routing component experienced a temporary wobble which had a knock-on effect and caused the CDN to fail to pull one piece of media content from our packager on time. The CDN increased its request load as part of its retry strategy, making the problem worse, and eventually disabled its internal caches, meaning that instead of collapsing player requests, it started forwarding millions of them directly to our packager. It wasn’t designed to serve several terabits of video data every second and was completely overwhelmed. Although we used more than one CDN, they all connected to the same packager servers, which led to us also being unable to serve the other CDNs. A couple of minutes into extra time, all our streams went down, and angry football fans were cursing the BBC across the country.
https://www.bbc.co.uk/webarchive/https%3A%2F%2Fwww.bbc.co.uk...
This feels like a bug in the implementation and not really a drawback of the pattern. "Routing component experienced a temporary wobble" also sounds like bug of sorts.
I worked in this space. All these potential failure modes and how they're mitigates is something that we paid a fair amount of attention to.
Hopefully they fix it because they are hosting two Christmas NFL games this year and if you want to really piss people off you have buffering issues during NFL games lol.
Maybe this was a stress test for the NFL games?
I'd expect the NFL games to have a largely American audience, but today's boxing event attracted a global audience.
I wonder what % of the "60-120m" viewers were from Europe vs America vs Asia
I can feel the pressure on the network engineers from here XD
Seems like the magic number was 60 million concurrent streams
https://www.theverge.com/2024/11/16/24298338/netflix-mike-tt...
That's a suspicious number, given the previous world record is 59 million.
After a few buffering timeouts during the first match, the rest of the event had no technical difficulties (in SoCal, so close to one of Netflix's HQs).
Unfortunately, except for the women's match, the fights were pretty lame...4 of the 6 male boxers were out of shape. Paul and Tyson were struggling to stay awake and if you were to tell me that Paul was just as old as Tyson I would have believed it.
Pure speculation as I have 0 knowledge.
Assuming Netflix used its extensive edge cache network to distribute the streams to the ISPs. The software on the caching servers would have been updated to be capable of dealing with receiving and distributing live streamed content, even if maybe the hardware was not optimal for that (throughput vs latency is a classic networking tradeoff).
Now inside the ISPs network again everyting would probably be optimized for the 99.99% usecase of the Netflix infra: delivering large bulk data that is not time sensitive. This means very large buffers to shift big gobs of packets in bulk.
As everything along the path is trying to fill up those buffers before shipping to the next router on the path, some endpoints aware this is a live stream start cancelling and asking for more recent frames ...
Hilarity ensues
Why do they want to get into the live business? It doesn't seem to synergize with their infrastructure. Sending the same stream in real time to numerous people just isn't the same task as letting people stream optimized artifacts that are prepositioned at the edge of the network.
Most PPV is what, $50-$70? So subscribing to Netflix for $20 or whatever per month sounds like a bargain for anyone who is interested and not already a customer. Then assume some large percentage doesn’t cancel either because they forgot, or because they started watching a show and then decided to keep paying.
They want to break into sports because it’s such a big business and if you do sports you need to be able to stream live.
Because its the one area traditional networks have the advantage in
The marginal cost to add a viewer to broadcast sports is zero! That's what I am getting at. I know why someone would want this business, I just don't see what aspect of Netflix's existing business made them think they'd be good at it.
Live is the only thing that won’t be commodified entirely. “Anyone” can pump out stream-when-you-want TV shows. Live events are generally exclusive, unpredictable, and cultural moments .
Not sure why this is being downvoted. I can see your point - it’s much harder to this live but a lot of their cdn infra can be reused.
I watched on an AppleTV and the stream was rock solid.
I don’t know if it’s still the case, but in the past some devices worked better than others during peak times because they used different bandwidth providers. This was the battle between Comcast and Cogent and Netflix.
Your device type has no influence on your provider and its bandwidth characteristics. If you're on Comcast, Apple can't magically make it not suck.
It isn't Apple, it's Netflix.
Remember back in 2014 or so when Netflix users on Comcast were getting slow connections and buffering pauses? It didn't affect people who watched Netflix via Apple TV because Netflix served Apple TV users with a different network.
> In a little known, but public fact, anyone who is on Comcast and using Apple TV to stream Netflix wasn’t having quality problems. The reason for this is that Netflix is using Level 3 and Limelight to stream their content specifically to the Apple TV device. What this shows is that Netflix is the one that decides and controls how they get their content to each device and whether they do it via their own servers or a third party. Netflix decides which third party CDNs to use and when Netflix uses their own CDN, they decide whom to buy transit from, with what capacity, in what locations and how many connections they buy, from the transit provider. Netflix is the one in control of this, not Comcast or any ISP.
https://www.streamingmediablog.com/2014/02/heres-comcast-net...
Ah, Thanks, I see what you're saying now and it makes a lot of sense. Just didn't grok it from your previous post, Sorry!
If you're on Apple, Comcast can make it magically not suck. Not sure if it's relevant though.
Cogent just seems to love picking fights with everyone (see Hurricane Electric). Why are they still in business?
Native apps have a lot more scope for client side load balancing due to having a different security model than browsers.
> I watched on an AppleTV and the stream was rock solid.
For me it was buffering and low resolution, on the current AppleTV model, hardwired, with a 1Gbps connection from AT&T. Some streaming devices may have handled whatever issues Netflix was having better than others, but this was clearly a bigger problem than just the streaming device.
I thought Netflix’s biggest advantage was the quality/salary of its engineers.
I think that every time I wait for Paramount+ to restart after its gone black in picture on picture, and yet, I’n still on Paramount+ and not Netflix, so maybe that advantage isn’t real.
Sigh, none of the competitors are much better. Disney, who has more than enough cash to throw at streaming, is a near constant hassle for us ( after 3 or more episodes it throws an inscrutable error on Playstation ). I would drop it, but this is the only remaining streaming service and wife is not willing to drop it ( I guess until 1 it is one error per one episode ).
I keep hearing that a lot of people are switching to the illegal competitor for better service.
Eh, the beef that I have is that I am already a paying customer. Why does it seem like I am getting subpar service in terms of delivery? I know it is a tired conversation on this forum, but corporations big and small do what they can do mess with experience to eke out few more cents from customers. It almost does not matter which industry one looks at, it is the same story; the only difference is typically how egregious things get[1].
[1]https://www.marketwatch.com/story/i-opt-to-fly-private-no-ma...
Do you think YouTube couldn't handle it?
I think this was true at some point, but I’ve been disappointed in the quality of the OSS Netflix tools recently. I think before k8s and a plethora of other tools matured, they were way ahead of the curve.
I specifically found the Netflix suite for Spring very lacking, and found message oriented architectures on something like NATS a lot easier to work with.
It was so bad. So so bad. Like don’t use your customers as guinea pigs for live streaming. So lame. They need a new head of content delivery. You can’t charge customers like that and market a massive event and your tech is worse than what we had from live broadcast tv.
i thought they did DSA interviews at netflix what happened? I had to watch the fight on someone streaming to X from their phone at the event and it was better than watching on netflix..if you could watch at all. extremely embarrassing!
My theory is they've so heavily optimized for static content and distributing content on edge nodes that they were probably poorly setup for live-streaming.
This, I feel bad for their engineers who were told every bonus would be a matter of how low can they get the cost-per-GB of transferred data, leading to the glorious Netflix-in-a-box (https://openconnect.netflix.com/deploymentguide.pdf) and then management casually asks for a live show with a wildly different set of guarantee requirements.
Its because they use medium-hard leetcode instead of hard. I suggest 8 rounds instead of 4
Surely more whiteboard questions are the key? Reversing a binary tree is so last year, they should make candidates reverse an octree.
someone's going to have to reverse a linked list of subscriptions.
Does anyone remember IP multicast?
I remember a lot of trade magazines in the late 1990's during the dot com boom talked about how important it would be.
https://en.wikipedia.org/wiki/IP_multicast
I never hear about it anymore. Is that because everyone wants to watch something different at their own time? Or is it actually working just fine now in the background? I see under the "Deployment" section it mentions IPTV in hotel rooms.
dumbing it down a bit: Imagine if anyone in your neighborhood could broadcast video and take up %N of the bandwidth to all of the routers in the neighborhood. Imagine this on your campus or at your office. This works for cable tv, as there's only 200 channels. You're just going to slurp up all of the bandwidth instead and maybe no one is watching the tv.
Sure you get these black swan events that everyone wants to watch, but they're just that, really infrequent. So instead you have to provision capacity if on the interent to do big events like this. The upside is that you have a billion people able to point to point to another billion people instead of 30 companies who can send to everyone.
Multicast isn't broadcast; bandwidth is only used on links with clients who have joined the group.
One similar crash I remember very well was CNN on 9/11 - I tired to connect from France but is down the whole day.
Since then I am very used to it because our institutional web sites traditionally crash when there is a deadline (typically the taxes or school inscriptions).
As for that one, my son is studying in Europe (I am also in Europe), he called me desperate at 5 am or so to check if he is the only one with the problem (I am the 24/7 family support for anything plugged in). After having liberally insulted Netflix he realized he confirmed with his grandparents that he will be helping them at 10 :)
They should have partnered with every major CDN and load balanced across all of them. It’s ironic how we used to be better at broadcasting live events way back in the day versus today.
I watched the event last night and didn't get any buffering issues, but I did notice frequent drop in video quality when watching the live feed. If I backed the video up a bit, the video quality suddenly went back up to 4k.
I had some technical experience with live video streaming over 15 years ago. It was a nightmare back then. I guess live video is still difficult in 2024. But congrats to Jake Paul and boxing fans. It was a great event. And breaking the internet just adds more hype for the next one.
I wonder how localized the issues were. I watched the Taylor/Seranno fight and the Paul/Tyson without issue and the picture quality was the best in every seen for live sports. Was blown away by how good it was. No where near what I’m getting with steaming NFL. This is what I want the future of live sports to look like. Though the commentary was so so.
I’m in the Pacific Northwest. I wonder if we got lucky on this or just some areas got unlucky.
It’s a learning experience! I remember Conor and Floyd broke hbo and the ufc. It’s a hard problem for sure!
Some buffering issues for us, but I bet views are off the charts. Huge for Netflix, bad for espn, paramount, etc etc
You're lucky you only had some buffering issues. This is got the case for many people, I don't know the percentage, but many people on reddit were complaining.
This is bad for Netflix imo.
I guess it kinda depends on viewer counts?
If you're going to be having intense algorithm interviews, paying top dollar for only hiring senior engineers, building high intensity and extreme distributed systems and having SRE engineers, we best see insanely good results and a high ROI out of it.
All of the conditions was perfect for Netflix, and it seems that the platform entirely flopped.
Is this what chaos engineering is all about that Netflix was marketing heavily to engineers? Was the livestream supposed to go down as Netflix removed servers randomly?
It seemed to be some capacity issue with the CDNs. When I stopped and restarted the stream it worked again. Perhaps they do not use real time multi-cdn switching.
What a massive blow to NFLX. They have been in the streaming game for years (survived COVID-19) and this silly exhibition match is what does them in?
I didn’t watch it live (boxing has lot its allure for me) but vicariously lived through it via social feed on Bluesky/Mastadon.
Billions of dollars at their disposal and they just can’t get it right. Probably laid off the highly paid engineers and teams that made their shit work.
> Probably laid off the highly paid engineers and teams that made their shit work.
More likely overpaying a bunch of posers without the chops, a victim of their own arrogance.
It's not a "massive blow" at all. Consumers will only vaguely remember this in a month. Netflix got a lot of new signups and got to test out their streaming infrastructure to figure out what needs work.
The fight itself was lame which worked in their favor. No one really cared about not being able to see every second of the "action". It's not like it was an NBA game that came down to the last second.
IPTV
A see in the comments multiple people talking about how "cable" companies who have migrated to IPTV has solved this problem.
I'd disagree.
I'm on IPTV and any major sporting event (World Series, Super Bowl, etc) is horrible buffering when I try to watch on my 4K IPTV (streaming) channel. I always have to downgrade to the HD channel and I still occasionally experience buffering.
So Netflix isn't alone in this matter.
Technical issues happen, but I wish they would've put up a YouTube stream or something (or at least asked YouTube to stop taking down the indie streams that were popping up). It seems like basically their duty to the boxers and the fans to do everything in their power to let the match be seen live, even if it means eating crow and using another platform.
Honestly you didn't miss much, every (real) boxing fan thought of this as a disgrace and a shame when announced. putting a 58 year old Tyson against a crackhead filled with steroids (Jake Paul) ? Either case it would have been a shame on Jake Paul for even getting in the ring with such an old boxer.
In boxing you are old by 32 or maybe 35 year old for heavy weight, and everything goes down very very fast.
End of rant.
It's far from perfect here in Canada, I keep having to pause it or go back and then load it again.
Oddly having watched PPV events via the high seas for years, it feels normal...
https://www.livemint.com/sports/news/mike-tyson-v-jake-paul-...
"Netflix streamed the fight to its 280 million subscribers"
Perhaps the technology was over-sold.
Wow I feel scammed. I paid for a Netflix subscription specifically for this but it's not loading so I'm watching on an illegal streaming website
Just adding a data point, here in Canada on my nVidia Shield it went down to 360p a dozen times or so, but never paused at all. I guess I got lucky.
Was this their first time doing live content? I figured something would go wrong. I'm sure lots of people were watching.
Not their first time, but the first time at this scale.
OK, but the last time they tried a livestream (a reality show reunion), it also fell over. I suppose to their credit, my stream never outright died yesterday, it just went to potato quality.
I’m not sure buffering was the biggest issue with this event. How was as 58 year old Tyson fighting a man in his 20s?
This wasn’t a “real” fight in the ring. It was clearly a hype/money fight only. The late 20 year old boxer has a massive following (or hate following) with younger age groups; and Mike Tyson brings the older age groups out. Mike has earned somewhat of a legendary status.
Leading up to the fight, there were many staged interactions meant to rile up the audience and generate hype and drive subscription revenue, and subsequently make ad spots a premium ($$$).
Unfortunately, American television/entertainment is so fake. I can’t get even be bothered to engage or invest time into it anymore.
It may as well have been a WWE special. As good as scripted. It was a business venture not a competitive fight.
I'm sure the architecture and scale of NetFlix's operations is truly impressive, but stories like this make me further appreciate the elegant simplicity of scalability of analogue terrestrial TV, and to a similar extent, digital terrestrial TV and satellite.
Streaming live can be a very different thing than on-demand.
Hopefully Netflix can share more about what they learned, I love learning about this stuff.
I had joked I would probably cancel Netflix after the fight.. since I realized other platforms seemed to have more content both old and new.
Then the video started stuttering.
We all know netflix was built for static content, but its still hilarious that they have thousands of engineers making 500-1M in total comp and they couldnt live stream a basic broadcast. You probably could have just run this on AWS with a CDK configuration and quota increase from amazon
On X.com someone had a stream that was stable to at least 5 million simultaneous viewers, but then (as I expected) someone at Netflix got them to pull the plug on it. So I would expect this fight to have say, 50 million + watching? Maybe as many as 150-250 million worldwide, given this is Tyson's last fight.
I’m very disappointed.
Woke up at 4am (EU here), to tune for the main event. Bought Netflix just for this. The women fight went good, no buffering, 4K.
As it approached the time for Paul vs Tyson, it started to first drop to 140p, and then constantly buffer. Restarted my chromecast a few times, tried from laptop, and finally caught a stream on my mobile phone via mobile network rather than my wifi.
The TV Netflix kept blaming my internet which kept coming back as “fast”.
Ended up watching the utterly disappointing, senior abuse, live stream on my mobile phone with 360p quality.
Gonna cancel Netflix and never pay for it it again, nor watch hyped up boxing matches.
You should ask for a refund if that's the only reason you paid for it.
TreeDN: Tree-Based CDNs for Mass Audience Live Streaming https://www.youtube.com/watch?v=wRUwsvept-8
Even people on hacker news do not understand.
Internet live streaming is harder than cable tv sattelite live streaming over "dumb" TV boxes cable. They should not have used internet for this honestly. A TV signal can go to millions live.
Every time it buffers for me, Netflix does an internet test only for it to come back and say its fast...
All these engineering blog posts, distributed systems and these complex micro-services clearly didn't help with this issue.
Netflix is clearly not designed nor prepared for scalable multi-region live-streaming, no matter the amount of 'senior' engineers they throw at the problem.
> Netflix is clearly not designed nor prepared for scalable multi-region live-streaming
Well, yes. Who would think Netflix was designed for that? They do VOD. They're only trying to move into this now.
This. Overly complex nonsense that they have built up is crazy and the fact that we on a tech forum agree to all of this is crazier.
It's almost like this platform has been taken over by JavaScripters completely.
> taken over by JavaScripters
That is an incredible way to phrase the sentiment, thank you.
All your corporate culture, comp Structure, Interview process etc etc is all so much meta if you can’t deliver. They showed they can’t deliver. Huge let down.
This is Netflix's statement on the fiasco
https://x.com/netflix/status/1857906492235723244
I thought it's only the best of the best of the best working at Netflix ... or maybe we can just put this myth to sleep that Netflix even knows what it's doing. The suggestions are shit, the UX is shit, apparently even the back end sucks.
Not enough chaos monkey engineering.
There was some blog post on HN the other day where someone said they don't do chaos monkey anymore... Even then, how do you chaos test a novel event ahead of time?
I'm watching on a 'pirate' stream because my netflix stream is absolutely frozen.
I would have just made it simple, delay the live stream a few seconds and encode it into the same bucket where users already is playing static movies. Just have the player only allow start at the time everyone is at.
I ended up turning my TV off and watching from my phone because of the buffering/freezing. The audio would continue to play and the screen would be frozen with a loading percentage that never changed.
I have Spectrum (600 Mbps) for ISP and Verizon for mobile.
Did anyone else see different behaviour with different clients? My TV failed on 25% loaded, my laptop loaded but played for a minute or two before getting stuck buffering, and my iphone played the whole fight fine. All on the same wifi network.
From my limited understanding, the NFL heavily depends on the Netflix Open Connect platform to stream media to edge locations, which is different from live streaming. Probably, they over-pushed the HD contents.
This livestream broke the internet, no joke. youtube was barely loading and a bunch of other sites too. 130M is a conservative number given all the pirate streams.
don't confuse your ISP breaking with every other provider or the rest of the Internet. It was more than fine here.
My internet was fine.
I'm watching the event as I'm writing this. I've been needing to exit the player and resume it constantly. Pretty surprising that Netflix hasn't weeded out these bugs.
I couldn’t watch a show a couple days ago. Long time customer, and first time I’ve considered cancelling. Broke the basic contract of I give $ and Netflix give show.
I switched to watching on the android app and it's been flawless. Sad, but workable
Serves Netflix right for killing my beloved DVD rentals.
Illegal streams are working but netflix is not. That is crazy.
I’m a little amused at folks tuning in for meme / low quality personalities doing things … and getting the equivalent production values.
I did some VPN hopping and connecting to an endpoint in Dallas has allowed me to start watching again. Not live though, that throws me back into buffering hell.
They're not used to live. I imagine that's it. All their caching infrastructure is there assuming the content isn't currently being generated.
Guess they should have livestreamed it on X to be safe!
Or Facebook. Or YouTube. Or Vimeo. Or LiveLeak.
It’s like watching a Minecraft cosplay of the event.
It's been fine since 11:00 EST, I wonder if they started using the CDN more effectively and pushed everyone back a few minutes?
Mine just crashed and reloaded to "Envoy Overloaded"
Maybe if jedberg and Brendan Gregg were still a part of Netflix, that this wouldn’t have happened.
I watched the whole fight with a 2 minute delay. That was frustrating and it didn't help that Tyson lost.
Well he is almost 60 and the average retirement age for pro boxers is in the mid 30s. He is well past his prime and in physically demanding sport that is very hard on participants.
Currently trying to watch it and it's not loading at all for me. Re-subscribed specifically for the fight.
It's not lagging for me. It crashed and not coming back.
Update: Switched to the app on my phone and so far so good.
Nucleix needs to focus on fixing middle-out compression instead of kicking cameras.
Why no one mentioned the term “vaporware”? Isn’t this a classic example of one?
Amazon prime streams the Thursday night NFL game and they seem to have no problem.
Isn't the scale a bit different though? Surely this event was an order of magnitude more concurrent viewers than some NFL game.
Sounds like a scene from: Silicon Valley - Nucleus fails
https://www.youtube.com/watch?v=9IGvzb-KCpY
I hope they do a postmortem
they should also do a business postmortem, how did the exec greenlight this without livestreaming infrastructure in place?
I thought Hooli was Google, but may be it was Netflix after all.
Works in Australia. Maybe their CDN is under a lot of stress?
In NZ, it was had maybe 2 low-quality moments, but never froze and was in high-definition for the rest of the time.
Don't jinx it.
So much for Netflix engineering talent aura
This is why we need ipv6. If ipv6 was fully rolled out this livestream could have been an efficient multicast stream like what happens with ipTV.
Over promised and under delivered. That’s a bad look
Chaos testing, nothing to see here.
Sounds like a job for Pied Piper
This reminds me of that scene in Silicon Valley
https://www.youtube.com/watch?v=9IGvzb-KCpY&t=50s
I clicked on this thread to type that exact thing, holy smokes.
You're referring to Hooli's streaming of UFC fight that goes awry and Gavin Belson totally loses it, lol. Great scene and totally relevant to what's happening with Netflix rn.
Shoulda used middle out compression.
They didn’t miss anything.
Silicon Valley predicted this: https://youtu.be/ddTbNKWw7Zs
Ota broadcasts are clearer
Streaming is hard.
Working okay for me
People still pay real world money to Netflix after they cancelled and how and why Warrior Nun just to see grandpa being beaten up.
I guess in the year when Trump is being reelected this is hardly a surprise.
Everyone pointing out that their illegal streams, X streams, etc. work fine are kind of missing the point.
These secondary streams might be serving a couple thousand users at best.
Initial estimates are in the hundreds of millions for Netflix. Kind of a couple of orders of magnitude difference there.
Piracy is distributed, yes.
I think that is the point, in fact.
Everything's easy when it's someone else's content.
I have to assume this is some snarky way of saying "violating copyright is Bad, m'kay".
Because taken at face value it's false. Any technical challenges involved in distributing a stream cannot possibly be affected by the legal status of the bits being pushed across the network.
It means that someone who spends 100% of their money on distribution is going to have an easier time than someone who pays for content and distribution.
Why didn’t they use Netflix AI to solve the problems?
How dare you insult the AI Gods
They have absolutely shit the bed here, and of course their socials are completely ignoring it.
off topic but.
i thought tyson was in eldercare.
I can't see the fight right now.
Is this potentially an aws issue?
I would assume not because twitch runs on aws. I think netflix engineers haven't optimized as much for livestreaming like twitch has
I do not think twitch has the amount of concurrent users netflix might have had today morning for the fight.
Looks like shit for me. Buffered a bit as well.
Was this the plot of a silicon valley episode?
yeah i'm using iptv which is just a rip of NF and its stuck buffering.
I blame RTO and AI
I'm an engineering manager at a Fortune 500 company. The dumbest engineer on our team left for Netflix. He got a pay raise too.
Our engineers are fucking morons. And this guy was the dumbest of the bunch. If you think Netflix hires top tier talent, you don't know Netflix.
> I'm an engineering manager at a Fortune 500 company. The dumbest engineer on our team left for Netflix. He got a pay raise too.
Apparently he was smart enough to get away from the Fortune 500 company he worked at, reporting to yourself, and "got a pay raise too."
> Our engineers are fucking morons. And this guy was the dumbest of the bunch.
See above.
> If you think Netflix hires top tier talent, you don't know Netflix.
Maybe you don't know the talent within your own organization. Which is entirely understandable given your proclamation:
Then again, maybe this person who left your organization is accurately described as such, which really says more about the Fortune 500 company employing him and presumably continues to employ yourself.IOW, either the guy left to get out from under an EM who says he is a "fucking moron" or he actually is a "fucking moron" and you failed as a manager to elevate his skills/performance to a satisfactory level.
> failed as a manager to elevate...
Managers aren't teachers. They can spend some time mentoring and teaching but there's a limit to that. I've worked with someone who could not write good code and no manager could change that.
Most people I've worked with aren't like that of course (there's really only one that stands out), so maybe you've just been lucky enough to avoid them.
I do find it unlikely that all of his engineers are morons, but on the other hand I haven't worked for a typical fortune 500 company - maybe that's where all the mediocre programmers end up.
White-Knighting for 'fucking morons' is not a good look though. You'll end up in a world where packets of peanuts have a label on saying 'may contain nuts'.
Which would be doubly silly as peanuts aren't actually nuts.
… which is why the label makes sense. They may have been contaminated with nuts during production.
I think acting as if peanuts are actually nuts for purposes of communication is much more defensible than acting as if tomatoes are vegetables, in short you are dying on a hill that was paved over long ago.
I agree most people will conflate them, but someone who's allergic to peanuts but not tree nuts (or vice versa), i.e. the people the labels are intended for, are going to care about the difference.
And you think white knighting for managers who call their directs all “fucking morons” is a good look?
... or a world where grown adults pay millions of dollars to watch grown adults fighting like school children.
In fact, what am I even doing in this thread? - close-tab.
That's the biggest confusion to me. Why on earth was this such a big deal? But perhaps Hacker News isn't the best place for that conversation.
This is the funniest thing I’ve read today
> or he actually is a "fucking moron" and you failed as a manager to elevate his skills/performance to a satisfactory level.
sometimes managers don't have the authority to fire somebody and are forced to keep their subordinates. Yes good managers can polish gold, but polishing poop still results in poop.
I was consulting at a place, there was a very bad programmer whose code looked sort of like this
cost arrayIneed = [];
const arrayIdontNeed = firstArray.map(item => {
if(item.hasProp) { arrayIneed.push(item); }
});
return arrayIneed;
the above is very much a cleaned up and elegant version of what he would actually push into the repo.
he left for a competitor in the same industry, this was at the second biggest company for the industry in Denmark and he left for the biggest company - presumably he got a pay raise.
I asked the manager after he was gone, one time when I was refactoring some code of his - which in the end just meant throwing it all out and rewriting from scratch - why he had been kept on so long, and the manager said there were some layoffs coming up and he would have been out with those but because of the way things worked it didn't make sense to let him go earlier.
> the manager said there were some layoffs coming up and he would have been out with those but because of the way things worked it didn't make sense to let him go earlier.
Incentives are fucked across the board right now.
Move on a low performer today and you'll have an uphill battle for a backfill at all. If you get one, many companies are "level-normalizing" (read: level-- for all backfills). Or perhaps your management thinks the job could be done overseas cheaper, or you get pushed to turn it into a set of tasks so you can farm it out to contractors.
So you keep at least some shitty devs to hold their positions, and as ballast to throw overboard when your bosses say "5% flat cut, give me your names". We all do it. If we get back to ZIRP I'll get rid of the actively bad devs when I won't risk losing their position entirely. Until then, it's all about squeezing what limited value they have out and keeping them away from anything important.
this however was back when incentives were not so messed up, but sure.
Hmm. Engineering managers should be setting the team culture and determining best criteria for extending an offer to a candidate. If theres a problem with the hiring process I'd look for the closest source that could or should be fixing it.
I don't think I'd want to work for you.
I hope to never have a manager who is mentally stack ranking me and my coworkers in terms of perceived dumbness instead of in terms of any positive trait.
Almost everyone I know manager or not is usually ranking everyone they work with on various attributes.
In fact it would be incredibly weird to ask a close friend who at their work kicks ass and who sucks and have them respond back, "I've never really thought about how good any of my coworkers were at their jobs"
I am a manager and I don't mentally stack rank my reports.
That's not out of respect or anything, but because they're all good. I hired and mentored them, and they all passed probation.
Sure there are junior devs who are just starting, but they're getting paid less, so they're pulling their weight proportionately. They're not worse.
dumbness is ranking intelligence, which is a positive trait, dumbness is just a metric for how often intelligence fails.
Example - the manager who started this sub-thread may be a pretty smart guy and able to accurately rate the intelligence of the engineers at his organization - but he had a minor momentary failing of intelligence to post on HN calling those engineers fucking morons.
You've got to rank how often the intelligence fails in someone to be able to figure out how reliable their intelligence is.
Btw how do you know your current manager is not doing that.
I don't. That's why I said hope :)
You’ll know when he ends every meeting with “dummies, get back to work”
I'm not a manager and I don't stack rank people, but I am 100% capable of knowing when one of my co-workers or predecessors is a fucking moron.
The trick is to use my massive brain to root cause several significant outages, discover that most of them originate in code written by the same employee, and notice that said employee liked to write things like
...except even worse, because instead of createWidget the name was something far less descriptive, the nesting was deeper and involved loops, there were random assignments that made no goddamn sense, and the API calls just went to an unnecessary microservice that was only called from here and which literally just passed the data through to a third party with minor changes. Those minor changes resulted in an internal API that was actually worse than the unmodified third party API.I am so tired of these people. I am not a 10x rockstar engineer and not without flaws, but they are just so awful and draining, and they never seem to get caught in time to stop them ruining perfectly fine companies. Every try>catch>return is like an icy cat hand from the grave reaching up to knock my coffee off my desk.
Isn't that a problem with your code review process? Why is that person's code making it to production?
So again, maybe they're a bad employee but it seems like nothing's done to even try and minimize the risks they present.
In this specific case, the fucking moron in question was the one who designed the code review process and hired the other engineers, and it took place a significant length of time before my involvement.
Which, yes, does raise interesting questions about how someone who can't be trusted to handle errors in an API call ended up in a senior enough position to do that.
There's a disincentive to actively block PRs if you don't want your coworkers to think you are a bad colleague / not on their side. So you often see suboptimal code making its way to production. This has a worse effect the more terrible engineers there are.
Except in this case it's clearly affecting at minimum the rest of OP's team.
At that point it's not one person being obnoxious and never approving their team members diffs and more of a team effort to do so.
But at minimum if you have a culture of trying to improve your codebase you'll inevitably set up tests, ci/cd with checks, etc. before any code can be deployed. Which should really take any weight of responsibility out of any one member of the team. Whether trying to put out code or reject said code.
Turning this into an incentive that everyone values is a signal that a team has a great culture.
I dunno, I've gone and done a "git blame" to find out who the fucking moron that wrote the code was, only to find out it was me three years ago.
Sure, there's such a thing as stupid code, but without knowing the whole context under which a bit of code was produced, unless it's utterly moronic, (which is entirely possible, dailywtf has some shiners), it's hard to really judge it. Hindsight, as applied to code, is 2020.
I agree with the general sentiment ("one instance of bad code might have been me") but not the specific sentiment ("I could easily decide to catch and ignore errors through every bit of code I worked on without knowing why that was bad, and commit other, similar crimes against good taste in the same way").
The difference for me is that this is pervasive. Yes, sometimes I might write code with a bug in error handling at 3am when I'm trying to bring a service back up, but I don't do it consistently across all the code that I touch.
I accept that the scope is hard to understand without having worked on a codebase which a genuine fucking moron has also touched. "Oh strken," you might say, "surely it can't be that bad." Yes, it can. I have never seen anything like this before. It's like the difference between a house that hasn't been cleaned in a week and a hoarder's house. If I tried to explain what hoarding is, well, maybe you'd reply that sometimes you don't do the dishes every day or mop the floor every week, and then I'd have to explain that the kitchen I'm talking about is filled from floor to roof with dirty dishes and discarded wrappers, including meat trays, and smells like a dead possum.
Hey, that possum's name was Roger and I'm really sad that it died. I've been feeding it for weeks! There are definitely bad programmers out there who's code is only suitable for public shaming via the daily wtf.
I've never seen a team that has somehow managed to hire exclusively morons. Even the shittiest of call center jobs and construction crews have a few people with enough brain cells to tie their shoelaces.
Have you considered that maybe you're being overly harsh about your co-workers? Maybe take the fact that one of them was hired by a top paying employer as a sign that you should improve your own ability to judge skill?
I've seen tons of them! The formula is to create conditions that will make even slightly competent people leave. They hire their morron nephew, he is always 30 min late then they moan when you are 5 min late because the parking lot was blocked by their car. He always leaves 2 hours early while you do overtime that they regularly forget to pay for. Your day is filled with your own work PLUS that of your retarded coworkers who only drink coffee while joking about you doing their work. You are not as fast as the last guy! haha! If something goes wrong the morrons collectively blame you, just like the last time. You get a formal warning. etc etc The other normal person they hire is let go after 2 days because they complaint which means they didnt fit the team.
And so on
If he still works there the morron who left was less of a.
At least half of that is on you. NEVER work unpaid/unlogged OT.
Here it is somewhat normal to "forget" so that you have to ask for it every time. My current employer has thousands of employees. "Forgetting" is good business. If money is tight they have people ask twice. You get a response like, didnt you already report this? Surely someone is working on it?
Can’t speak for every place but that’s not always an option. As a teenager, I worked at Sports Direct where the management would regularly work us after our allotted hours and bar us putting the extra time onto our timesheet. If I recall correctly, the company eventually got pulled for it but the money they’d have saved over years would have outweighed the fine.
The timesheets were on paper so good luck putting your real hours on without your manager, who files it, finding out.
I’d be amazed if they ever cleaned up their act.
Yes and: IIRC, the USA has at least $8b of wage theft per year.
Report that shit to your local Department of Labor equivalent. They would have gotten you, and everyone else in that store, their owed money.
You’re asking children to have full understanding of their rights and how to enforce them. Also, investigations into this started in 2020: over a decade after I left. Do you think nobody had reported this in all that time? Looks like the system wasn’t working as well as you think it does.
Having worked with a bunch of guys who have gone on to "top teams", I no longer believe they have top teams. My fav was the guy who said the system could scale indefinitely after it literally fell on its ass from too much traffic. He couldn't understand that just because Lambdas my themselves can scale, they are limited by the resources they use, so just ignored it and insisted that it could. The same guy also kept on saying we should change the TPEG standard because he didn't like how it worked. And these companies are seriously pretending they've got the best and brightest. If that's really true, I really need to find another profession.
I've worked for many companies that said they hired the best. And to be honest when I hire I also try to hire good people. I think I could hire better if a) I had an open cheque, b) I was running coolest project in the universe. I did hire for some interesting projects but nothing close to an open cheque. Even under these conditions it's tough to find great people. You can go after people with a proven track record but even that doesn't always guarantee their next project will be as successful.
The reality though is that large companies with thousands of people generally end up having average people. Some company may hire more PhD's. But on average those aren't better software engineers than non-PhD's. Some might hire people who are strong competitive coders, but that also on average isn't really that strong of a signal for strong engineers.
Once you have a mix of average people, on a curve, which is the norm, the question becomes do you have an environment where the better people can be successful. In many corporate environments this doesn't happen. Better engineers may have obstacles put in front of them or they can forced out of the organization. This is natural because for most organizations can be more of a political question than a technical question.
Smaller organizations, that are very successful (so can meet my two criterias) and can be highly selective or are highly desirable, can have better teams. By their nature as smaller organizations those teams can also be effective. As organizations grow the talent will spread out towards average and the politics/processes/debt/legacy will make those teams less effective.
To be fair, when you need to hire hundreds or thousands of people, you gotta hire average people. The best is a finite resource and not all of the best want to work for FAANG or any megacorp.
I used to want to work at a FAANG-like company when I was just starting out thinking they were going to be full of amazing devs. But over the years, I've seen some of the worst devs go to these companies so that just destroyed that illusion. And the more you hear about the sort of work they do, it just sounds boring compared to startups.
> Our engineers are fucking morons.
I interviewed at Netflix a few years ago; with several of their engineers. One thing I cannot say is that they are morons.
their interview process is top notch too and while I was ultimately rejected, I used their format as the base template for how I started hiring at my company.
I don't have a dog in this fight, but you typically use your A players for hiring/interviews.
It can be both true that Netflix has God tier talent and a bunch of idiots. In fact, that's probably true of most places. I guess the ratio matters more.
oR god tier talent and a bunch of other god tier talent that decided to coast and cash their fat checks
what seemed good about it that makes different then any other hiring process that seems detached from the job?
None of the Fortune 500 companies hire top talent. They have a few good people but 98% of the engineers are average at best. Over paid.
This is every dev house I've worked at. For most people (mostly not the ones on HN), coding is a 9-5 job. No ambition. Just lines of code. Go home. I don't know there is anything particularly wrong with that.
You just have to accept most staff at any corporation are simply just average. There has to be an average for there to be better and worse.
Our engineers are fucking morons
If your "dumbest engineer" got a job and a hefty raise going to Netflix, it means he was very capable engineer who was playing the part of moron at this Fortune 500 company because he was reporting to a manager who was calling him and the entire team morons and he didn't feel the need to go above and beyond for that manager.
Also, highly likely that it was the manager that was the moron and not everyone around him.
> If your "dumbest engineer" got a job and a hefty raise going to Netflix, it means he was very capable engineer who was playing the part of moron
It's also possible that there's very little correlation between capability, reputation and salary.
Don't we all know someone who is overpaid? There are more than a few well known cases of particular employers who select for overpaid employees...
> well known cases of particular employers who select for overpaid employees
Not well-known enough, apparently. Where should I be applying?
There are different forms of overpayment but to give some examples:
- The recent story of AWS using serverless for video processing comes to mind [1].
- Google is renowned for rest and vest.
- Many government jobs pay more than their private counterparts.
- Military contractors
- Most of the healthcare industry
- Lobbyists
[1] https://news.ycombinator.com/item?id=35811741
In terms of healthcare industry in Hungary: one worker does the same job for 700 USD a month and another for 1100 USD, the only difference is formal education and years worked in the industry. You can perform much better (by actually caring about the patients in those 12 hours you work) but you will get paid the same amount regardless. Of course if you have 3 kids (whether they are adults or not) then you do not pay taxes (or much less than someone who does not have kids or only has 2).
> Don't we all know someone who is overpaid?
Yes, usually managers.
Why is your team morons? Kind of disparaging maybe? Fish rots from the head situation?
Curb your enthusiasm had a good segment of episodes that were a parody on Netflix and how they shifted hiring from merit to other criteria.
Curb did? What season/episodes?
I can +1 with a similar anecdote.
They obviously have some really good engineers, but many low-tier ones as well. No idea how long they stay there, though.
I'm watching the fight now and have experienced the buffering issues. Bit embarrassing for a company that fundamentally only does a single thing, which is this. Also, yeah, 900k TC and whatnot but irl you get this. Mediocre.
livestream is quite different from streaming pre-processed video, so I'm not surprised by the scaling issues
i was mildly interested and managed to find a pirate livestream, it didn't have buffering issues lol
Well the pirate site was not live-streaming to 100 million users…
I know.
But given how much they spend on engineering, how much time they had and how important this event is ... mediocre performance.
All true, but this part of your GP comment:
> a company that fundamentally only does a single thing, which is this
… isn’t true. From the couch, watching Suits and watching a live sports match may seem similar but they’re very different technical problems to solve.
Or in other words: In one case the "stream" is stored on a harddrive not far away from you, only competing for bandwidth in the last section to you. In the other case the "stream" is comming over the Internet to you and everyone else at the same time.
I have to wonder if its a regional thing. I'm watching from the southern pacific in HD, and its been excellent.
I’d imagine it is fairly dependent on which cache you’re connected to.
Reputation usually lags reality by 5+ years. See: Google.
Absolutely right. Netflix was once all about the sports team mentality. Now they’re Man Utd.
Given Man Utd under Ferguson used to be THE football team, you could say they were always Man Utd and still are ;-)
Haha indeed, but I’m a gooner. You’d never see me admit that.
It mostly makes sense to me. From their bombastic blogs to github projects full of overwrought Enterprise java design patterns. The only thing great about Netflix is it pays a lot more.
Sometimes if everyone else is the problem you are the problem.
Its all ego when these companies think they hire the best.
What are the chances that your entire engineering team is entirely composed of low performers or people with bad attitude or whatever you designate as "fucking morons"?
It's more likely that you are bad at managing, growing and motivating your team.
Even if it was true, to refer to your team in this way makes you look like you are not ready for management.
Your duty is to get the most out of the team, and that mindset won't help you.
Don’t agree. Sometimes you can observe the world around you, and it’s not pretty. Are they not allowed to observe the truth as they see it? What if they are right?
I don't understand why nobody here believes you.
There's no reason to doubt what you say, probably people identify with the mistreated one. Why?
Because the idea that all the engineers that work at his large company are morons is absurd. Anyone in that situation that believes that and even more, states it, is just making their own character flaws apparent.
It’s hyperbole, like a teacher complaining to others, “my kids were all crazed animals today.”
I’ve worked with engineers where I had to wonder how they found their computer every morning. I can easily see how a few of those would make you bitter and angry.
Let me think about it...
All the engineers in MY company are morons.
They're just bureucrats.
You ever thought they were doing the bare minimum and studying at night to leave?
> I'm an engineering manager
How are you involved in the hiring process?
> Our engineers are fucking morons. And this guy was the dumbest of the bunch.
Very indicative of a toxic culture you seem to have been pulled in to and likely have contributed to by this point given your language and broad generalizations.
Describing a wide group of people you're also responsible for as "fucking morons" says more about you than them.
(We detached this subthread from https://news.ycombinator.com/item?id=42154036.)
this is why managers get a bad rap. what proportion think like this? hopefully not a large one but i do worry. ultimately if the team sucks its because of the management. theyre the ones with the greatest power to change things for the better.
I'm going to avoid leaving a zero-effort response like "actually you're the problem" like half of the replies and contribute:
Why do you call your engineers morons? Is it a lack of intelligence, a lack of wisdom, a lack of experience, inability to meet deadlines, reading comprehension, or something else?
I wonder if Netflix is just hiring for different criteria (e.g. you want people who will make thoughtful decisions while they want people who have memorized all the leetcode problems).
Sounds like you’re a good match for their team then
Your job must be truly awful.
An engineering manager who thinks his engineers are morons and dumb?
I have questions..
Top troll bro
[dead]
Sounds like he got a better deal. If this is how you describe your team, I suspect they are all submitting their resumes hoping to get away from you.
[dead]
[dead]
[dead]
[dead]
[dead]
Dupe: https://news.ycombinator.com/item?id=42153906
How is this not a solved problem by now?
I think this is a result of most software "engineering" having become a self-licking ice cream cone. Besides mere scaling, the techniques and infrastructure should be mostly squared away.
Yes, it's all complicated, but I don't think we should excuse ourselves when we objectively fail at what we do. I'm not saying that Netflix developers are bad people, but that it doesn't matter how hard of a job it is; it was their job and what they did was inadequate to say the least.
Jonathan Blow is right.