New AI Training Technique Is Drastically Faster, Says Google

84 points by moondistance a year ago

vessenes a year ago

So the paper itself is pretty significant, I think, from looking at it. The general methodology seems to be: train small model as a discriminatory scoring model on very high quality data (JEST is mostly concerned with multi-modal tasks it seems, so think image/text caption pairs), have that model score ‘maximally learnable’ batches on a larger / lower quality dataset, then train the big model using the scoring.

This turns out to be significant FLOPs and quality win, even counting for the initial model training and scoring part of it, they claim roughly 10x for quality/FLOP tradeoffs, and they show some significantly beating SOTA numbers for some tasks in their model size.

The bad part, to me, is that this is some significant engineering — it requires known high quality datasets, training of the scoring model, selection and scoring of the data for the big training run - this is not a bold new leap that’s going to be easy to implement for hobbyists - this is a practitioner’s excellent engineering showing the way forward for certain training needs.

As always, appreciate the publishing from DeepMind - this looks like great work. It would be nice to see a company like together.ai or others get it actionized into a pipeline; it might be a bit, though. It looks relatively gnarly in the details on the data and scoring side.

kmmlng a year ago

Isn't this similar to what Microsoft did with their Phi models?
- vessenes a year ago
  
  I don’t think so — the Phi training plan was to pull answers from textbooks and have GPT-4 write questions for the answers, thus ensuring high quality completions. They then trained on this data, fairly indiscriminately. This is about quality of training data, but it’s much more general in that it’s an approach that can target broad scale web data using a small / cheap model to ‘sort’ and prioritize.

morbicer a year ago

Nice. Google scientists come up with ground breaking idea, then Google's PM bungles the chance to bring it to the market and productize it and someone like OpenAI or Anthropic will swoop in to reap the rewards. And the cycle repeats.

Deep Mind people invent transformers and then they watch people laugh at Bard or what it's called nowadays because product and engineering lost the plot. Kodak is paging you some message from the grave, read it Google.

kirubakaran a year ago

Sounds like a management issue, not a PM/Engg issue
- morbicer a year ago
  
  Yes, PM to me stands for product _management_ so management issue. Same for engineering - it doesn't mean just individual contributors, there's someone managing the engineering as well.
  - kirubakaran a year ago
    
    I meant Senior Management. "Product Manager" manages the product, not people.
    
    butyo a year ago
    
    The context of the comment chain is Google failing to get product out the door before the competition based upon research results. Sounds like backend management is finding things and product management failing to capitalize was their point.
    
    kirubakaran a year ago
    
    Ah I see what you mean, thanks!
- dyauspitr a year ago
  
  Not being able to take ideas and turn them into products clients want is solely a product management issue.
  - fhub a year ago
    
    Not always. For example if bringing a new product to market has the perception it might eat into existing revenue then all sorts of managment shenanigans will likely happen at most big orgs.
dyauspitr a year ago

Gemini is solid. I’ll give them a year or two before they start building an unscalable moat.
- josephg a year ago
  
  Claude 3.5 seems better, to me. And ChatGPT is still excellent. Why on earth do you think Google will win this race?
  - dieortin a year ago
    
    Claude 3.5 is much newer, not really a fair comparison. Google has the benefit of a huge client base, including enterprises. They can integrate Gemini into their other offerings, which OpenAI / Anthropic cannot.
  - ricopags a year ago
    
    Largest context window (because of most compute) wins.
    They've got more serious engineering heavyweights, putting a lot of collective work on fewer tasks/approaches. Microsoft is taking more of a kitchen sink approach
    
    josephg a year ago
    
    If that was going to make the difference, why is ChatGPT way more popular today than Gemini? Why isn't Google already ahead of their competitors? Google ahead? I'll believe it when I see it.
    Google are in the race for sure, but they aren't winning. Claude holds that crown for me at the moment. The artifacts feature is wonderful.
    
    dyauspitr a year ago
    
    If Claude is so good why is everyone using chatGPT?
    
    josephg a year ago
    
    Because it only gotten good in the last few months. ChatGPT has been the king for years, and it has some inertia at this point.
    
    patrickhogan1 a year ago
    
    This is the same argument Musk used (incorrectly) to indicate why OpenAI wouldn’t win and why it should merge with Tesla.
dbuser99 a year ago

What are you on about? They publish their research advancing the field. And gemini has caught up with openai and anybody else.
- morbicer a year ago
  
  I am glad they are advancing the field but I think it's unfortunate that doesn't make them top dog. Gemini is not top tier to me but I admit that confusing naming and spotty worldwide rollout might be a reason why I am not familiar with their best model. But that's a signal on it's own.
  The launch was faked and I don't think the real thing is here yet https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...
- AndyNemmity a year ago
  
  Based on this comment, decided to try out gemini.
  Total disaster. Doing similar tasks to openai and claude, it just borks. And it is complaining about my desire to use a gender guesser python libary, and tells me that's inappropriate for non-binary people, and it won't do it.
  That's fun.
  Edit 1: Also it refuses to print the entire script. I've tried many work arounds, it seems to only want to output a very small number of total lines.
  Threw it into ChatGPT and immediately it fixed all the issues with Gemini, and worked on first try.
  Edit 2: The only thing better about Gemini as far as I can tell, is that the copy code button is on the bottom. ChatGPT's is at the top, and that's dumb.
  Edit 3: I'm being downvoted heavily now, to be clear, I didn't intentionally seek out the gender issue, it's just what I was working on.
  I'm currently trying to generate infographics based on wrestlers, and I needed to split the men from the female for championship title rankings.
  I have no problem with it in general, it just came up, so I communicated it.
  Multiple times Gemini removed the code using the gender guesser library because it felt I shouldn't use it. When trying to determine wrestlers, and their Title Chances, it makes a lot of sense...
  But Gemini just refused to allow me to use it, which seems like a ridiculous thing. I want to make the choices here.
  - fswd a year ago
    
    I've had the same exact experiences.
  - pheatherlite a year ago
    
    Problem with Google summed up. Ethics and pseudo sciences folks wanting to opinionate technology. That's akin to a kitchen knife refusing to cut gift wrapping paper because that's inappropriate use of a knife. The silliness
- septic-liqueur a year ago
  
  The problem with Gemini is the guardrails they've built into it which makes it useless for me. Which is a problem that has to do with Google and not any AI smarts.
hiddencost a year ago

Deepmind did not invent transformers...
https://arxiv.org/abs/1706.03762 https://arxiv.org/abs/1810.04805
- josephg a year ago
  
  Look at the author lists in the pdfs. Almost all of them are @google.com. They were Google employees when they wrote and published those papers.
  - alecco a year ago
    
    Google Brain and Google Research are not DeepMind (an acquired company based in London). Transformers came out on 2017. Last year DeepMind was told to merge with the rest of Google AI (kinda). But it looks like they are still quite independent.
    https://en.wikipedia.org/wiki/Google_Brain
    https://en.wikipedia.org/wiki/Google_DeepMind
    https://research.google/teams/

eutropia a year ago

https://arxiv.org/pdf/2406.17711 - link to the paper

kelseyfrog a year ago

Great, improvements in efficiency will lead to greater resource consumption due to Jevons Paradox[1].

1. https://en.wikipedia.org/wiki/Jevons_paradox

epistasis a year ago

Jevon's paradox is not inevitable, and only happens in a very few situations, and certainly not all.
And your statement of it is incorrect. It can result in greater demand, but it's not about resulting greater resource usage.
Some minority of efficiency improvements can sometimes lead to greater resource consumption, but overall efficiency does result in less resource usage.
- kelseyfrog a year ago
  
  How do we know if this particular instance will result in Jevons paradox?
Mehvix a year ago

>the falling cost of use induces increases in demand enough that resource use is increased, rather than reduced
This is just saying throughput is increased, yes? The time to train, and thus iterate (i.e. dialing in hyperpaprams) will decrease.
- kelseyfrog a year ago
  
  It calls into question the byline "which could mean lower energy demands."
  Ie: more efficient steam engines lead to both an increase of steam engine throughput as well as coal consumption, an increase in AI efficiency can lead to an increase in training throughput and energy consumption.
  The paradox is a result of prevalence scaling faster than efficiency and efficiency driving prevalence.

ricopags a year ago

Pretty similar to cappy https://arxiv.org/abs/2311.06720

swax a year ago

AI advancement is coming at us both ways - orders of magnitude more compute, with orders of magnitude more efficiency. Hyper exponential.

Dylan16807 a year ago

The efficiency has not improved all that much, and when you multiply two exponential growths it's still exponential.
Though even when you add the efficiency improvements I think we're still lagging behind Moore's Law overall.
downboots a year ago

I only hope it brings about more integration of our vast amounts of data instead of more generative inaccuracy