I was excited to check out lecture videos thinking they were public, but quickly saw that they were closed.
One of the things I miss most about the pandemic was how all of these institutions opened up for the world. Lately they have been closing down not only newer course offerings but also putting old videos private. Even MIT OCW falls apart once you get into some advanced graduate courses.
I understand that universities should prioritize their alumni, but there’s literally no cost in making the underlying material (especially lectures!) available on the internet. It delivers immense value to the world.
I've seen arguments that opening up fresh material makes it easy for less honest institutions to plagiarize your work. I've even heard professors say they don't want to share their slides or record their lectures, because it's their copyright.
I personally don't like this, because it makes a place more exclusive with legal moats, not genuine prestige. If you're a professor, this also makes your work less known, not more. IMO the only beneficiaries are either those who paid a lot to be there, lecturers who don't want to adapt, and university admins.
I wish we would speed run this to where these super star profs open their classes to 20,000 people at a lower price point (but where this yields them more profit)
That's basically MOOCs, but those kinda fizzled out. It's tough to actually stay focused for a full-length university-level course outside of a university environment IMO, especially if you're working and have a family, etc.
(I mean, I have no idea how Coursera/edX/etc are doing behind the scenes, but it doesn't seem like people talk about them the way they used to ~10 years ago.)
It’s been said that RL is the worst way to train a model, except for all the others. Many prominent scientists seem to doubt that this is how we’ll be training cutting edge models in a decade. I agree, and I encourage you to try to think of alternative paradigms as you go through this course.
If that seems unlikely, remember that image generation didn’t take off till diffusion models, and GPTs didn’t take off till RLHF. If you’ve been around long enough it’ll seem obvious that this isn’t the final step. The challenge for you is, find the one that’s better.
RL is still widely used in the advertising industry. Don't let anyone tell you otherwise. When you have millions to billions of visits and you are trying to optimize an outcome RL is very good at that. Add in context with contextual multi-armed bandits and you have something very good at driving people towards purchasing.
You're assuming that people are only interested in image and text generation.
RL excels at learning control problems. It is mathematically guaranteed to provide an optimal solution for the state and controls you provide it, given enough runtime. For some problems (playing computer games), that runtime is surprisingly short.
There is a reason self-driving cars use RL, and don't use GPTs.
Some part of it, but I would argue with a lot of guardrail in place and not as common as you think. I don't think the majority of the planner/control stack out there in SDC is based. I also don't think any production SDCs are RL-based.
Control theory and reinforcement learning are different ways of looking at the same problem. They traditionally and culturally focussed on different aspects.
I like to think of RLHF as a technique that I, as a student, used to apply to score good marks in my exam. As soon as I started working, I realized that out-of-distribution generalization can't be only achieved from practicing in an environment with verifiable rewards.
I feel like both this comment and the parent comment highlight how RL has been going through a cycle of misunderstanding recently from another one of its popularity booms due to being used to train LLMs
While collecting data according to policy is part of RL, 'reductive' is an understatement. It's like saying algebra is all about scalar products. Well yes, 1%
More likely we will develop general super intelligent AI before we (together with our super intelligent friends) solve the problem of combinatorial optimization.
There's nothing to solve. The CoD kills you no matter what. P=NP or maybe quantum computing is the only hope of making serious progress on large-scale combinatorial optimization.
As a "tradional" ML guy who missed out on learning about RL in school, I'm confused about how to use RL in "traditional" problems.
Take, for example, a typical binary classifier with a BCE loss. Suppose I wanted to shoehorn RL onto this: how would I do that?
Or, for example, the House Value problem (given a set of features about a house for sale, predict its expected sale value). How would I slap RL onto that?
I guess my confusion comes from how the losses are hooked up. Traditional losses (BCE, RMSE, etc.) I know about; but how do you bring RL loss into problems?
RL is nice in that it is handles messy cases where you don't have per example labels.
How do you build a learned chess playing bot? Essentially the state of the art is to find a clever way of turning the problem of playing chess into a sequence of supervised learning problems.
So IIUC RL is applicable only when the outcome is not immediately available.
Let's say I do have a problem in that setting; say the chess problem, where I have a chess board with the positions of chess pieces and some features like turn number, my color, time left on the clock, etc. are available.
Would I train a DNN with these features? Are there some libraries where I can try out some toy problems?
I guess coming from a classical ML background I am quite clueless about RL but want to learn more. I tried reading the Sutton and Barto book, but got lost in the terminology. I'm a more hands-on person.
I believe Kochenderfer et.al.'s book "Algorithms for decision making" is also about reinforcement learning and related approaches. Free PDFs are available at https://algorithmsbook.com
I was excited to check out lecture videos thinking they were public, but quickly saw that they were closed.
One of the things I miss most about the pandemic was how all of these institutions opened up for the world. Lately they have been closing down not only newer course offerings but also putting old videos private. Even MIT OCW falls apart once you get into some advanced graduate courses.
I understand that universities should prioritize their alumni, but there’s literally no cost in making the underlying material (especially lectures!) available on the internet. It delivers immense value to the world.
I've seen arguments that opening up fresh material makes it easy for less honest institutions to plagiarize your work. I've even heard professors say they don't want to share their slides or record their lectures, because it's their copyright.
I personally don't like this, because it makes a place more exclusive with legal moats, not genuine prestige. If you're a professor, this also makes your work less known, not more. IMO the only beneficiaries are either those who paid a lot to be there, lecturers who don't want to adapt, and university admins.
I wish we would speed run this to where these super star profs open their classes to 20,000 people at a lower price point (but where this yields them more profit)
That's basically MOOCs, but those kinda fizzled out. It's tough to actually stay focused for a full-length university-level course outside of a university environment IMO, especially if you're working and have a family, etc.
(I mean, I have no idea how Coursera/edX/etc are doing behind the scenes, but it doesn't seem like people talk about them the way they used to ~10 years ago.)
2024 lecture videos are on YouTube: https://youtube.com/playlist?list=PLoROMvodv4rN4wG6Nk6sNpTEb...
Those don't have DPO/GRPO which arguably made some parts of RL obsolete.
It’s been said that RL is the worst way to train a model, except for all the others. Many prominent scientists seem to doubt that this is how we’ll be training cutting edge models in a decade. I agree, and I encourage you to try to think of alternative paradigms as you go through this course.
If that seems unlikely, remember that image generation didn’t take off till diffusion models, and GPTs didn’t take off till RLHF. If you’ve been around long enough it’ll seem obvious that this isn’t the final step. The challenge for you is, find the one that’s better.
RL is still widely used in the advertising industry. Don't let anyone tell you otherwise. When you have millions to billions of visits and you are trying to optimize an outcome RL is very good at that. Add in context with contextual multi-armed bandits and you have something very good at driving people towards purchasing.
You're assuming that people are only interested in image and text generation.
RL excels at learning control problems. It is mathematically guaranteed to provide an optimal solution for the state and controls you provide it, given enough runtime. For some problems (playing computer games), that runtime is surprisingly short.
There is a reason self-driving cars use RL, and don't use GPTs.
> self-driving cars use RL
Some part of it, but I would argue with a lot of guardrail in place and not as common as you think. I don't think the majority of the planner/control stack out there in SDC is based. I also don't think any production SDCs are RL-based.
I have been using it to train it on my game hotlapdaily
Apparently AI sets the best time even better than the pros It is really useful when it comes to controlled environment optimizations
You are exactly right.
Control theory and reinforcement learning are different ways of looking at the same problem. They traditionally and culturally focussed on different aspects.
I like to think of RLHF as a technique that I, as a student, used to apply to score good marks in my exam. As soon as I started working, I realized that out-of-distribution generalization can't be only achieved from practicing in an environment with verifiable rewards.
RL is barely even a training method, its more of a dataset generation method.
I feel like both this comment and the parent comment highlight how RL has been going through a cycle of misunderstanding recently from another one of its popularity booms due to being used to train LLMs
care to correct the misunderstanding?
I mean DPO, PPO, and GRPO all use losses that are not what’s used with SFT for one.
They also force exploration as a part of the algorithm.
They can be used for synthetic data generation once the reward model is good enough.
Its reductive, but also roughly correct.
While collecting data according to policy is part of RL, 'reductive' is an understatement. It's like saying algebra is all about scalar products. Well yes, 1%
What about for combinatorial optimization? When you have a simulation of the world what other paradigms are fitting
More likely we will develop general super intelligent AI before we (together with our super intelligent friends) solve the problem of combinatorial optimization.
There's nothing to solve. The CoD kills you no matter what. P=NP or maybe quantum computing is the only hope of making serious progress on large-scale combinatorial optimization.
GPT wouldn't have even been possible, let alone take off, without self supervised learning.
RLHF is what gave us the ChatGPT moment. Self supervised learning was the base for this.
SSL creates all the connections and RL learns to walk the paths
Are the videos available somewhere?
spring course is on YouTube https://m.youtube.com/playlist?list=PLoROMvodv4rN4wG6Nk6sNpT...
As a "tradional" ML guy who missed out on learning about RL in school, I'm confused about how to use RL in "traditional" problems.
Take, for example, a typical binary classifier with a BCE loss. Suppose I wanted to shoehorn RL onto this: how would I do that?
Or, for example, the House Value problem (given a set of features about a house for sale, predict its expected sale value). How would I slap RL onto that?
I guess my confusion comes from how the losses are hooked up. Traditional losses (BCE, RMSE, etc.) I know about; but how do you bring RL loss into problems?
I just wouldn't.
RL is nice in that it is handles messy cases where you don't have per example labels.
How do you build a learned chess playing bot? Essentially the state of the art is to find a clever way of turning the problem of playing chess into a sequence of supervised learning problems.
So IIUC RL is applicable only when the outcome is not immediately available.
Let's say I do have a problem in that setting; say the chess problem, where I have a chess board with the positions of chess pieces and some features like turn number, my color, time left on the clock, etc. are available.
Would I train a DNN with these features? Are there some libraries where I can try out some toy problems?
I guess coming from a classical ML background I am quite clueless about RL but want to learn more. I tried reading the Sutton and Barto book, but got lost in the terminology. I'm a more hands-on person.
RL is extremely brittle, it's often difficult to make it converge. Even Stanford folks admit that. Are there any solutions for this?
FlowRL is one, it’s learning the full distribution of rewards rather than just optimizing toward a single maximum
Thanks, that looks very promising!
Given Ilya's podcast this is an interesting title.
So, basically AI Winter? :-)
That's how I read it XD "oh no, RL is dead too"
I didn't get the reference. Please elaborate.
he said RL sucks because it narrowly optimizes to solve a certain set of problems in certain sets of conditions.
he compared it to students who win at math competition but cant do anything practical .
Kindly suggest some books about RL?
I've already studied a lot of deep learning.
Please confirm if these resoruces are good, or suggest yours:
Sutton et al. - Reinforcement Learning
Kevin Patrick Murphy - Reinforcement Learning, an overview https://arxiv.org/abs/2412.05265
Sebastian Raschka (upcoming book)
...
I believe Kochenderfer et.al.'s book "Algorithms for decision making" is also about reinforcement learning and related approaches. Free PDFs are available at https://algorithmsbook.com
[dead]