vessenes 3 months ago

So the paper itself is pretty significant, I think, from looking at it. The general methodology seems to be: train small model as a discriminatory scoring model on very high quality data (JEST is mostly concerned with multi-modal tasks it seems, so think image/text caption pairs), have that model score ‘maximally learnable’ batches on a larger / lower quality dataset, then train the big model using the scoring.

This turns out to be significant FLOPs and quality win, even counting for the initial model training and scoring part of it, they claim roughly 10x for quality/FLOP tradeoffs, and they show some significantly beating SOTA numbers for some tasks in their model size.

The bad part, to me, is that this is some significant engineering — it requires known high quality datasets, training of the scoring model, selection and scoring of the data for the big training run - this is not a bold new leap that’s going to be easy to implement for hobbyists - this is a practitioner’s excellent engineering showing the way forward for certain training needs.

As always, appreciate the publishing from DeepMind - this looks like great work. It would be nice to see a company like together.ai or others get it actionized into a pipeline; it might be a bit, though. It looks relatively gnarly in the details on the data and scoring side.

  • kmmlng 3 months ago

    Isn't this similar to what Microsoft did with their Phi models?

    • vessenes 3 months ago

      I don’t think so — the Phi training plan was to pull answers from textbooks and have GPT-4 write questions for the answers, thus ensuring high quality completions. They then trained on this data, fairly indiscriminately. This is about quality of training data, but it’s much more general in that it’s an approach that can target broad scale web data using a small / cheap model to ‘sort’ and prioritize.

morbicer 3 months ago

Nice. Google scientists come up with ground breaking idea, then Google's PM bungles the chance to bring it to the market and productize it and someone like OpenAI or Anthropic will swoop in to reap the rewards. And the cycle repeats.

Deep Mind people invent transformers and then they watch people laugh at Bard or what it's called nowadays because product and engineering lost the plot. Kodak is paging you some message from the grave, read it Google.

  • kirubakaran 3 months ago

    Sounds like a management issue, not a PM/Engg issue

    • morbicer 3 months ago

      Yes, PM to me stands for product _management_ so management issue. Same for engineering - it doesn't mean just individual contributors, there's someone managing the engineering as well.

      • kirubakaran 3 months ago

        I meant Senior Management. "Product Manager" manages the product, not people.

        • butyo 3 months ago

          The context of the comment chain is Google failing to get product out the door before the competition based upon research results. Sounds like backend management is finding things and product management failing to capitalize was their point.

          • kirubakaran 3 months ago

            Ah I see what you mean, thanks!

    • dyauspitr 3 months ago

      Not being able to take ideas and turn them into products clients want is solely a product management issue.

      • fhub 3 months ago

        Not always. For example if bringing a new product to market has the perception it might eat into existing revenue then all sorts of managment shenanigans will likely happen at most big orgs.

  • dyauspitr 3 months ago

    Gemini is solid. I’ll give them a year or two before they start building an unscalable moat.

    • josephg 3 months ago

      Claude 3.5 seems better, to me. And ChatGPT is still excellent. Why on earth do you think Google will win this race?

      • dieortin 3 months ago

        Claude 3.5 is much newer, not really a fair comparison. Google has the benefit of a huge client base, including enterprises. They can integrate Gemini into their other offerings, which OpenAI / Anthropic cannot.

      • ricopags 3 months ago

        Largest context window (because of most compute) wins.

        They've got more serious engineering heavyweights, putting a lot of collective work on fewer tasks/approaches. Microsoft is taking more of a kitchen sink approach

        • josephg 3 months ago

          If that was going to make the difference, why is ChatGPT way more popular today than Gemini? Why isn't Google already ahead of their competitors? Google ahead? I'll believe it when I see it.

          Google are in the race for sure, but they aren't winning. Claude holds that crown for me at the moment. The artifacts feature is wonderful.

          • dyauspitr 3 months ago

            If Claude is so good why is everyone using chatGPT?

            • josephg 3 months ago

              Because it only gotten good in the last few months. ChatGPT has been the king for years, and it has some inertia at this point.

        • patrickhogan1 3 months ago

          This is the same argument Musk used (incorrectly) to indicate why OpenAI wouldn’t win and why it should merge with Tesla.

  • dbuser99 3 months ago

    What are you on about? They publish their research advancing the field. And gemini has caught up with openai and anybody else.

    • morbicer 3 months ago

      I am glad they are advancing the field but I think it's unfortunate that doesn't make them top dog. Gemini is not top tier to me but I admit that confusing naming and spotty worldwide rollout might be a reason why I am not familiar with their best model. But that's a signal on it's own.

      The launch was faked and I don't think the real thing is here yet https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...

    • AndyNemmity 3 months ago

      Based on this comment, decided to try out gemini.

      Total disaster. Doing similar tasks to openai and claude, it just borks. And it is complaining about my desire to use a gender guesser python libary, and tells me that's inappropriate for non-binary people, and it won't do it.

      That's fun.

      Edit 1: Also it refuses to print the entire script. I've tried many work arounds, it seems to only want to output a very small number of total lines.

      Threw it into ChatGPT and immediately it fixed all the issues with Gemini, and worked on first try.

      Edit 2: The only thing better about Gemini as far as I can tell, is that the copy code button is on the bottom. ChatGPT's is at the top, and that's dumb.

      Edit 3: I'm being downvoted heavily now, to be clear, I didn't intentionally seek out the gender issue, it's just what I was working on.

      I'm currently trying to generate infographics based on wrestlers, and I needed to split the men from the female for championship title rankings.

      I have no problem with it in general, it just came up, so I communicated it.

      Multiple times Gemini removed the code using the gender guesser library because it felt I shouldn't use it. When trying to determine wrestlers, and their Title Chances, it makes a lot of sense...

      But Gemini just refused to allow me to use it, which seems like a ridiculous thing. I want to make the choices here.

      • fswd 3 months ago

        I've had the same exact experiences.

      • pheatherlite 3 months ago

        Problem with Google summed up. Ethics and pseudo sciences folks wanting to opinionate technology. That's akin to a kitchen knife refusing to cut gift wrapping paper because that's inappropriate use of a knife. The silliness

    • septic-liqueur 3 months ago

      The problem with Gemini is the guardrails they've built into it which makes it useless for me. Which is a problem that has to do with Google and not any AI smarts.

  • hiddencost 3 months ago
kelseyfrog 3 months ago

Great, improvements in efficiency will lead to greater resource consumption due to Jevons Paradox[1].

1. https://en.wikipedia.org/wiki/Jevons_paradox

  • epistasis 3 months ago

    Jevon's paradox is not inevitable, and only happens in a very few situations, and certainly not all.

    And your statement of it is incorrect. It can result in greater demand, but it's not about resulting greater resource usage.

    Some minority of efficiency improvements can sometimes lead to greater resource consumption, but overall efficiency does result in less resource usage.

    • kelseyfrog 3 months ago

      How do we know if this particular instance will result in Jevons paradox?

  • Mehvix 3 months ago

    >the falling cost of use induces increases in demand enough that resource use is increased, rather than reduced

    This is just saying throughput is increased, yes? The time to train, and thus iterate (i.e. dialing in hyperpaprams) will decrease.

    • kelseyfrog 3 months ago

      It calls into question the byline "which could mean lower energy demands."

      Ie: more efficient steam engines lead to both an increase of steam engine throughput as well as coal consumption, an increase in AI efficiency can lead to an increase in training throughput and energy consumption.

      The paradox is a result of prevalence scaling faster than efficiency and efficiency driving prevalence.

swax 3 months ago

AI advancement is coming at us both ways - orders of magnitude more compute, with orders of magnitude more efficiency. Hyper exponential.

  • Dylan16807 3 months ago

    The efficiency has not improved all that much, and when you multiply two exponential growths it's still exponential.

    Though even when you add the efficiency improvements I think we're still lagging behind Moore's Law overall.

  • downboots 3 months ago

    I only hope it brings about more integration of our vast amounts of data instead of more generative inaccuracy