rty32 3 months ago

Andrej Karpathy's video is probably much better than this:

https://youtu.be/l8pRSuU81PU

  • elpocko 3 months ago

    It's gotta be an exceptionally awesome video to beat written text though. I'll take mediocre text over good video any day.

    • throwaway71271 3 months ago

      This particular video is an exceptionally awesome video.

      And so are his other neural net videos.

      He keeps bringing you back to first principles, again and again, does not let a single doubt remain, and the code is very accessible and practical.

      The intersection between those who can teach and those who truly understand the deepest essence of a subject is much smaller than we would hope. People like Feynman, Tanenbaum, Sussman, Susskind, and now Karpathy are exceedingly rare. Each of them is a gift for generations to come. So, when you find one whose style resonates with the way you think, I suggest watching their videos multiple times. :)

    • frognumber 3 months ago

      Convert video to text with this one simply secret trick:

      * yt-dlp

      * whisper

      (whisper is surprisingly good for a lot of educational videos)

      Even better is to connect text to images. but that's less of a simple trick (although I do have the code).

      • omerhac 3 months ago

        Thats cool but not exactly the same as reading text that was written to be consumed as text. I think a big part of the reason technical text is easier to digest is the way it's written, and not so much the medium.

    • nottorp 3 months ago

      How can a video be better than text based info when we're talking about a programming task, which is all text?

      • hombre_fatal 3 months ago

        Because the value isn't in just having raw reference material like complex code you just paste into your program. The value is in communicating ideas and understanding.

        You don't need to skim through the 4 hour video long to see it's full of whys and hows and explanations and demonstrations that massively dwarf the blog post. It's basically a mini course.

        Programming is an interactive process, so transcribing the video to text also removes a lot information. The video isn't just an audio form of what could've been a long text tutorial. You're watching someone do something.

        • nottorp 3 months ago

          > The value is in communicating ideas and understanding.

          And those ideas absolutely can't be communicated in concise writing, I have to waste my life on listening to talking heads?

          • jamescmartinez 3 months ago

            This is an option for those who want to learn from a more interactive medium instead of from a textbook.

            Different people learn in different ways. I wouldn’t call any of it a waste.

          • hombre_fatal 3 months ago

            Yes. Skim the video. You're watching someone explain things interactively in a way that cannot be done with text.

            Whether you have the preference for it or not is not up for debate here.

            • nottorp 3 months ago

              > Skim the video.

              So not all 4 hours are worth watching? :)

      • graovic 3 months ago

        You can listen someone thought process, that is extremely valuable

cjtrowbridge 3 months ago

Also check out Andrej's new llm.c library which includes a script to do this from scratch with fineweb.

omerhac 3 months ago

Cool blog, thanks!

I did a similar project a couple of years ago for a university course, only I also added style transfer, it turned out pretty cool. I scraped a bunch of news data together with it's news section and trained a self attention language model from scratch, turned out pretty hilarious. Data was in Hebrew, which is a challenge to tokenize because of the morphology. I posted it on ArXiV if someone's interested in the style transfer and tokenization process: https://arxiv.org/abs/2212.03019

moffkalast 3 months ago

That's cool as a learning experience, but if you're gonna build a language transformer, why not instead of ClosedAI's outdated nonsense learn something with a more established open architecture like llama, so whatever you end up training ends up plug and play compatible with every LLM tool in the universe when converted to a GGUF?

Otherwise it's like learning to build a website and stopping short of actually doing the final bit where you put it on a webserver and run it live.

KTibow 3 months ago

Reminds me of TinyStories. I wonder if this architecture is better or worse than the ones it tested.