Build and train GPT-2 from scratch using PyTorch

rty32 a year ago

Andrej Karpathy's video is probably much better than this:

elpocko a year ago

It's gotta be an exceptionally awesome video to beat written text though. I'll take mediocre text over good video any day.
- throwaway71271 a year ago
  
  This particular video is an exceptionally awesome video.
  And so are his other neural net videos.
  He keeps bringing you back to first principles, again and again, does not let a single doubt remain, and the code is very accessible and practical.
  The intersection between those who can teach and those who truly understand the deepest essence of a subject is much smaller than we would hope. People like Feynman, Tanenbaum, Sussman, Susskind, and now Karpathy are exceedingly rare. Each of them is a gift for generations to come. So, when you find one whose style resonates with the way you think, I suggest watching their videos multiple times. :)
- frognumber a year ago
  
  Convert video to text with this one simply secret trick:
  * yt-dlp
  * whisper
  (whisper is surprisingly good for a lot of educational videos)
  Even better is to connect text to images. but that's less of a simple trick (although I do have the code).
  - omerhac a year ago
    
    Thats cool but not exactly the same as reading text that was written to be consumed as text. I think a big part of the reason technical text is easier to digest is the way it's written, and not so much the medium.
- nottorp a year ago
  
  How can a video be better than text based info when we're talking about a programming task, which is all text?
  - hombre_fatal a year ago
    
    Because the value isn't in just having raw reference material like complex code you just paste into your program. The value is in communicating ideas and understanding.
    You don't need to skim through the 4 hour video long to see it's full of whys and hows and explanations and demonstrations that massively dwarf the blog post. It's basically a mini course.
    Programming is an interactive process, so transcribing the video to text also removes a lot information. The video isn't just an audio form of what could've been a long text tutorial. You're watching someone do something.
    
    nottorp a year ago
    
    > The value is in communicating ideas and understanding.
    And those ideas absolutely can't be communicated in concise writing, I have to waste my life on listening to talking heads?
    
    jamescmartinez a year ago
    
    This is an option for those who want to learn from a more interactive medium instead of from a textbook.
    Different people learn in different ways. I wouldn’t call any of it a waste.
    
    hombre_fatal a year ago
    
    Yes. Skim the video. You're watching someone explain things interactively in a way that cannot be done with text.
    Whether you have the preference for it or not is not up for debate here.
    
    nottorp a year ago
    
    > Skim the video.
    So not all 4 hours are worth watching? :)
  - graovic a year ago
    
    You can listen someone thought process, that is extremely valuable

Also check out Andrej's new llm.c library which includes a script to do this from scratch with fineweb.

omerhac a year ago

Cool blog, thanks!

I did a similar project a couple of years ago for a university course, only I also added style transfer, it turned out pretty cool. I scraped a bunch of news data together with it's news section and trained a self attention language model from scratch, turned out pretty hilarious. Data was in Hebrew, which is a challenge to tokenize because of the morphology. I posted it on ArXiV if someone's interested in the style transfer and tokenization process: https://arxiv.org/abs/2212.03019

moffkalast a year ago

That's cool as a learning experience, but if you're gonna build a language transformer, why not instead of ClosedAI's outdated nonsense learn something with a more established open architecture like llama, so whatever you end up training ends up plug and play compatible with every LLM tool in the universe when converted to a GGUF?

Otherwise it's like learning to build a website and stopping short of actually doing the final bit where you put it on a webserver and run it live.

aziis98 a year ago

The link to the repo looks broken

https://github.com/ajeetkharel/gpt2-from-scratch/

KTibow a year ago

Reminds me of TinyStories. I wonder if this architecture is better or worse than the ones it tested.