What is with the negativity in these comments? This is a huge, huge surface area that touches a large percentage of white collar work. Even just basic automation/scaffolding of spreadsheets would be a big productivity boost for many employees.
My wife works in insurance operations - everyone she manages from the top down lives in Excel. For line employees a large percentage of their job is something like "Look at this internal system, export the data to excel, combine it with some other internal system, do some basic interpretation, verify it, make a recommendation". Computer Use + Excel Use isn't there yet...but these jobs are going to be the first on the chopping block as these integrations mature. No offense to these people but Sonnet 4.5 is already at the level where it would be able to replicate or beat the level of analysis they typically provide.
Having wrangled many spreadsheets personally, and worked with CFOs who use them to run small-ish businesses, and all the way up to one of top 3 brokerage houses world-wide using them to model complex fixed income instruments... this is a disaster waiting to happen.
Spreadsheet UI is already a nightmare. The formula editing and relationship visioning is not there at all. Mistakes are rampant in spreadsheets, even my own carefully curated ones.
Claude is not going to improve this. It is going to make it far, far worse with subtle and not so subtle hallucinations happening left and right.
The key is really this - all LLMs that I know of rely on entropy and randomness to emulate human creativity. This works pretty well for pretty pictures and creating fan fiction or emulating someone's voice.
It is not a basis for getting correct spreadsheets that show what you want to show. I don't want my spreadsheet correctness to start from a random seed. I want it to spring from first principles.
My first job out of uni was building a spreadsheet infra as code version control system after a Windows update made an eight year old spreadsheet go haywire and lose $10m in a afternoon.
To me, the case for LLMs is strongest not because LLMs are so unusually accurate and awesome, but because if human performance were put on trial in aggregate, it would be found wanting.
Humans already do a mediocre job of spreadsheets, so I don't think it is a given that Claude will make more mistakes than humans do.
HN has a base of strong anti-AI bias, I assume is partially motivated by insecurity over being replaced, losing their jobs or having missed the boat on the AI.
Based on the comments here, it's surprisingly anything in society works at all. I didn't realize the bar was "everything perfect every time, perfectly flexible and adaptable". What a joy some of these folks must be to work with, answering every new technology with endless reasons why it's worthless and will never work.
HN has an obsession with quality too, which has merit, but is often economically irrelevant.
When US-East-1 failed, lots of people talked about how the lesson was cloud agnosticism and multi cloud architecture. The practical economic lesson for most is that if US-East-1 fails, nobody will get mad at you. Cloud failure is viewed as an act of god.
I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.
It's one thing to fudge the language in a report summary, it can be subjective, however numbers are not subjective. It's widely known LLMs are terrible at even basic maths.
Even Google's own AI summary admits it which I was surprised at, marketing won't be happy.
Yes, it is true that LLMs are often bad at math because they don't "understand" it as a logical system but rather process it as text, relying on pattern recognition from their training data.
Seems like you're very confused about what this work typically entails. The job of these employees is not mental arithmatic. It's closer to:
- Log in to the internal system that handles customer policies
- Find all policies that were bound in the last 30 days
- Log in to the internal system that manages customer payments
- Verify that for all policies bound, there exists a corresponding payment that roughly matches the premium.
- Flag any divergences above X% for accounting/finance to follow up on.
Practically this involves munging a few CSVs, maybe typing in a few things, setting up some XLOOKUPs, IF formulas, conditional formatting, etc.
Will AI replace the entire job? No...but that's not the goal. Does it have to be perfect? Also no...the existing employees performing this work are also not perfect, and in fact sometimes their accuracy is quite poor.
Actually, yes. This kind of management reporting is either (1) going to end up in the books and records of the company - big trouble if things have to be restated in the future or (2) support important decisions by leadership — who will be very much less than happy if analysis turns out to have been wrong.
A lot of what ties up the time of business analysts is ticking and tying everything to ensure that mistakes are not made and that analytics and interpretations are consistent from one period to the next. The math and queries are simple - the details and correctness are hard.
Speak for yourself and your own use cases. There are a huge diversity of workflows with which to apply automation in any medium to large business. They all have differing needs. Many excel workflows I'm personally familiar with already incoporate a "human review" step. Telling a business leader that they can now jump straight to that step, even if it requires 2x human review, with AI doing all of the most tediuous and low-stakes prework, is a clear win.
Sometimes there can be an advantage in leading or lagging some aspects of internal accounting data for a time period. Basically sitting on credits or debits to some accounts for a period of weeks. The tacit knowledge to know when to sit on a transaction and when to action it is generally not written down in formal terms.
I'm not sure how these shenanigans will translate into an ai driven system.
That’s the kind of thing that can get a company into a lot of trouble with its auditors and shareholders. Not that I am offering accounting advice of course. And yeah, one can not “blame” and ai system or try to ai-wash any dodgy practices.
Checking someone elses spreadsheet is a fucking nightmare. If your company has extremely good standards it's less miserable because at least the formatting etc will be consistent...
The one thing LLMs should consistently do is ensure that formatting is correct. Which will help greatly in the checking process. But no, I generally don't trust them to do sensible things with basic formulation. Not a week ago GPT 5 got confused whether a plus or a minus was necessary in a basic question of "I'm 323 days old, when is my birthday?"
I think you have a misunderstanding of the types of things that LLMs are good at. Yes you're 100% right that they can't do math. Yet they're quite proficient at basic coding. Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.
My concern would be more with how to check the work (ie, make sure that the formulas are correct and no columns are missed) because Excel hides all that. Unlike code, there's no easy way to generate the diff of a spreadsheet or rely on Git history. But that's different from the concerns that you have.
The model ought to be calling out to some sort of tool to do the math—effectively writing code, which it can do. I'm surprised the major LLM frontends aren't always doing this by now.
So do it in basic code where numbering your line G53 instead of G$53 doesn't crash a mass transit network because somebody's algorithm forgot to order enough fuel this month.
I've built spreadsheet diff tools on Google sheets multiple times. As the needs grows I think we will see diffs and commits and review tools reach customers
hey Collin! I am working on an AI agent on Google Sheets, I am curious if any of your designs are out in the public. We are trying to re-think how diffs should look like and want to make something nicer than what we currently have, so curious.
Maybe LLMs will enable a new type of work in spreadsheets. Just like in coding we have PR reviews, with an LLM it should be possible to do a spreadsheet review. Ask the LLM to try to understand the intent and point out places where the spreadsheet deviates from the intent. Also ask the LLM to narrate the spreadsheet so it can be understood.
That first condition "try to understand the intent" is where it could go wrong. Maybe it thinks the spreadsheet aligns with the intent, but it misunderstood the intent.
LLMs are a lossy validation, and while they work sometimes, when they fail they usually do so 'silently'.
Maybe we need some kind of method, framework to develop intent. Most of things that go wrong in knowledge working are down to lack of common understanding of intent.
> The one thing LLMs should consistently do is ensure that formatting is correct.
In JavaScript (and I assume most other programming languages) this is the job of static analysis tools (like eslint, prettier, typescript, etc.). I’m not aware of any LLM based tools which performs static analysis with as good a results as the traditional tools. Is static analysis not a thing in the spreadsheet world? Are there the tools which do static analysis on spreadsheets subpar, or offer some disadvantage not seen in other programming languages? And if so, are LLMs any better?
Just use a normal static analysis tool and shove the result to an LLM. I believe Anthropic properly figured that agents are the key, in addition to models, contrary to OpenAI that is run by a psycho that only believes in training the bigger model.
Sysadmin of a small company. I get asked pretty often to help with a pivot table, vlookup, or just general excel functions (and smartsheet, these users LOVE smartsheet)
Last time, I gave claude an invoice and asked it to change one item on it, it did so nicely and gave me the new invoice. Good thing I noticed it had also changed the bank account number..
The more complicated the spreadsheet and the more dependencies it has, the greater the room for error. These are probabilistic machines. You can use them, I use them all the time for different things, but you need to treat them like employees you can't even trust to copy a bank account number correctly.
We’ve tried to gently use them to automate some of our report generation and PDF->Invoice workflows and it’s a nightmare of silent changes and absence of logic.. basic things like specifically telling it “debits need to match credits” and “balance sheets need to balance” that are ignored.
Yeah, asking llm to edit one specific thing in a large or complex document/ codebase is like those repeated "give me the exact same image" gifs. It's fundamentally a statistical model so the only thing we can be _certain_ of is that _it's not_. It might get the desired change 100% correct but it's only gonna get the entire document 99 5%
The use cases for spreadsheets are much more diverse than that. In my experience, spreadsheets just as often used for calculation. Many of them do require high accuracy, rely on determinism, and necessitate the understanding of maths ranging from basic arithmetic to statistics and engineering formulas. Financial models, for example, must be built up from ground truth and need to always use the right formulas with the right inputs to generate meaningful outputs.
I have personally worked with spreadsheet based financial models that use 100k+ rows x dozens of columns and involve 1000s of formulas that transform those data into the desired outputs. There was very little tolerance for mistakes.
That said, humans, working in these use cases, make mistakes >0% of the time. The question I often have with the incorporation of AI into human workflows is, will we eventually come to accept a certain level of error from them in the way we do for humans?
Do you trust humans to be precise and deterministic, or even to be especially good at math?
This is talking about applying LLMs to formula creation and references, which they are actually pretty good at. Definitely not about replacing the spreadsheet's calculation engine.
Most real-world spreadsheets I've worked with were fragile and sloppy, not precise and deterministic. Programmers always get shocked when they realize how many important things are built on extremely messy spreadsheets, and that people simply accept it. They rather just spend human hours correcting discrepancies than trying to build something maintainable.
Eh, yes. In theory. In practice, and this is what I have experienced personally, bosses seem to think that you now have interns so you should be able to do 5x the output.. guess what that means. No verification or rubber stamp.
I like to use Claude Code to write deterministic computer programs for me, which then perform the actual work. It saves a lot of time.
I had a big backlog of "nice to have scripts" I wanted to write for years, but couldn't find the time and energy for. A couple of months after I started using Claude Code, most of them exist.
That’s great and the only legitimate use case here. I suspect Microsoft will not try to limit customers to just writing scripts and will instead allow and perhaps even encourage them to let the AI go ham on a bunch of raw data with no intermediary code that could be reviewed.
I don’t see the issue so much as the deterministic precision of an LLM, but the lack of observability of spreadsheets. Just looking at two different spreadsheets, it’s impossible to see what changes were made. It’s not like programming where you can run a `git diff` to see what changes an LLM agent made to a source code file. Or even a word processing document where the text changes are clear.
Spreadsheets work because the user sees the results of complex interconnected values and calculations. For the user, that complexity is hidden away and left in the background. The user just sees the results.
This would be a nightmare for most users to validate what changes an LLM made to a spreadsheet. There could be fundamental changes to a formula that could easily be hidden.
For me, that the concern with spreadsheets and LLMs - which is just as much a concern with spreadsheets themselves. Try collaborating with someone on a spreadsheet for modeling and you’ll know how frustrating it can be to try and figure out what changes were made.
"I don't trust LLMs to do the kind of precise deterministic work" => I think LLM is not doing the precise arithmetic. It is the agent with lots of knowledge (skills) and tools. Precise deterministic work is done by tools (deterministic code). Skills brings domain knowledge and how to sequence a task. Agent executes it. LLM predicts the next token.
Sure, but this isn't requiring that the LLM do any math. The LLM is writing formulas and code to do the math. They are very good at that. And like any automated system you need to review the work.
Exactly, and if it can be done in a way that helps users better understand their own spreadsheets (which are often extremely complex codebases in a single file!) then this could be a huge use case for Claude.
>I don't trust LLMs to do the kind of precise deterministic work
not just in a spreadsheet, any kind of deterministic work at all.
find me a reliable way around this. i don't think there is one. mcp/functions are a band aid and not consistent enough when precision is important.
after almost three years of using LLMs, i have not found a single case where i didn't have to review its output, which takes as long or longer than doing it by hand.
ML/AI is not my domain, so my knowledge is not deep nor technical. this is just my experience. do we need a new architecture to solve these problems?
ML/AI is not my domain but you don’t have to get all that technical to understand that LLMs run on probability. We need a new architecture to solve these problems.
If only we had created some device that could perform deterministic calculations and then wrote software that made it easy for humans to use such calculations.
ok but humans are idiots, if only we could make some sort of Alternate Idiot, a non-human but every bit as generally stupid as humans are! This A.I would be able to do every stupid thing humans did with the device that performed deterministic calculations only many times faster!
Yes and when the AI did that all the stupid humans could accept its output without question. This would save the humans a lot of work and thought and personal responsibility for any mistakes! See also Israel’s Lavender for an exciting example of this in action.
It is bad in a very specific sense, but I did not see any other comments express the bad parts instead of focusing merely on the accuracy part ( which is an issue, but not the issue ):
- this opens up ridiculous flood of data that would otherwise be semi-private to one company providing this service
- this works well small data sets, but will choke on ones it will need to divvy up into chunks inviting interesting ( and yet unknown ) errors
There is a real benefit to being able to 'talk to data', but anyone who has seen corporate culture up close and personal knows exactly where it will end.
edit: an i saying all this as as person, who actually likes llms.
The vast majority of people in business and science are using spreadsheets for complex algorithmic things they weren't really designed for, and we find a metric fuckton of errors in the sheets when you actually bother looking auditing them, mistakes which are not at all obvious without troubleshooting by... manually checking each and every cell & cell relation, peering through parentheses, following references. It's a nightmare to troubleshoot.
LLMs specialize in making up plausible things with a minimum of human effort, but their downside is that they're very good at making up plausible things which are covertly erroneous. It's a nightmare to troubleshoot.
There is already an abject inability to provision the labor to verify Excel reasoning when it's composed by humans.
I'm dead certain that Claude will be able to produce plausibly correct spreadsheets. How important is accuracy to you? How life-critical is the end result? What are your odds, with the current auditing workflow?
Okay! Now! Half of the users just got laid off because management thinks Claude is Good Enough. How about now?
Anthropic now has all your company's data, and all you saved was the cost of one human minus however much they charge for this. The good news is it can't have your data again! So starting from the 163rd-165th person you fire, you start to see a good return and all you've sacrificed is exactitude, precision, judgement, customer service and a little bit of public perception!
They already doing that with AI, rejecting claims at higher numbers than before .
Privatized insurance will always find a way to pay out less if they could get away with it . It is just nature of having the trifecta of profit motive , socialized risk and light regulation .
If you think that insurance companies have "light regulation", I shudder to think of what "heavy regulation" would look like. (Source: I'm the CTO at an insurance company.)
Light did not mean to imply quantity of paperwork you have to do, rather are you allowed to do the things you want to do as a company.
More compliance or reporting requirements usually tend to favor the larger existing players who can afford to do it and that is also used to make the life difficult and reject more claims for the end user.
It is kind of thing that keeps you and me busy, major investors don't care about it all, the cost of the compliance or the lack is not more than a rounding number in the balance, the fines or penalties are puny and laughable.
The enormous profits year on year for decades now, the amount of consolidation allowed in the industry show that the industry is able to do mostly what they want pretty much, that is what I meant by light regulation.
I agree, and I can see where it comes from (at least at the state level).
The cycle is: bad trend happens that has deep root causes (let's say PE buying rural hospitals because of reduced Medicaid/Medicare reimbursements); legislators (rightfully) say "this shouldn't happen", but don't have the ability to address the deep root causes so they simply regulate healthcare M&As – now you have a bandaid on a problem that's going to pop up elsewhere.
I mean even in the simple stuff like denying payment for healthcare that should have been covered. CMS will come by and out a handful of cases, out of millions, every few years.
So obviously the company that prioritizes accuracy of coverage decisions by spending money on extra labor to audit itself is wasting money. Which means insureds have to waste more time getting the payment for healthcare they need.
I should have been clearer - profit maximization above all else as long it is mostly legal. Neither profit or profit maximization at all cost is nature of everything .
There are many other entity types from unions[1], cooperatives , public sector companies , quasi government entities, PBC, non profits that all offer insurance and can occasionally do it well.
We even have some in the US and don’t think it is communism even - like the FDIC or things like social security/ unemployment insurance.
At some level government and taxation itself is nothing but insurance ? We agree to paying taxes to mitigate against variety of risks including foreign invasion or smaller things like getting robbed on the street.
[1] Historically worker collectives or unions self-organized to socialize the risks of both major work ending injuries or death.
Ancient to modern armies operate on because of this insurance the two ingredients that made them not mercenaries - a form of long term insurance benefit (education, pension, land etc) or family members in the event of death and sovereign immunity for their actions.
That would be illegal though, the goal is do this legally after all.
We also have to remember all claims aren't equal. i.e. some claims end up being way costlier than others. You can achieve similar % margin outcomes by putting a ton of friction like, preconditions, multiple appeals processes and prior authorization for prior authorization, reviews by administrative doctors who have no expertise in the field being reviewed don't have to disclose their identity and so and on.
While U.S. system is most extreme or evolved, it is not unique, it is what you get when you end up privatize insurance any country with private insurance has some lighter version of this and is on the same journey .
Not that public health system or insurance a la NHS in UK or like Germany work, they are underfunded, mismanaged with long times in months to see a specialist and so on.
We have to choose our poison - unless you are rich of course, then the U.S. system is by far the best, people travel to the U.S. to get the kind of care that is not possible anywhere else.
This is a great application of this quote. Insurance providers have 0 incentive to make their AI "good" at processing claims, in fact it's easy to see how "bad" AI can lead to a justification to deny more claims.
Probably because many people here are software developers, and wrapping spreadsheets in deterministic logic and a consistent UI covers... most software use cases.
Yeah, this could be a pretty big deal. Not everyone is an excel expert, but nearly everyone finds themselves having to work with data in excel at some time or other.
What does scaffolding of spreadsheets mean? I see the term scaffolding frequently in the context of AI-related articles and not familiar with this method and I’m hesitant to ask an LLM.
Scaffolding typically just refers to a larger state machine style control flow governing an agent's behavior and the suite of external tools it has access to.
I have to admit that my first thought was “April’s fool”. But you are right. It makes a lot of sense (if they can get it to work well). Not only is Excel the world’s biggest “programming language”. It’s probably also one of the most unintuitive ways to program.
in the short run. In the long run, productivity gains benefit* all of us (in a functional market economy).
*material benefit. In terms of spirit and purpose, the older I get the more I think maybe the Amish are on to something. Work gives our lives purpose, and the closer the work is to our core needs, the better it feels. Labor saving so that most of us are just entertaining each other on social networks may lead to a worse society (but hey, our material needs are met!)
The issue isn’t in creating a new monstrosity in excel.
The issue is the poor SoB who has to spelunk through the damn thing to figure out what it does.
Excel is the sweet spot of just enough to be useful, capable enough to be extensible, yet gated enough to ensure everyone doesn’t auto run foreign macros (or whatever horror is more appropriate).
In the simplest terms - it’s not excel, it’s the business logic. If an excel file works, it’s because theres someone who “gets” it in the firm.
I used to live in Excel too. I've trudged through plenty of awful worksheets. The output I've seen from AI is actually more neatly organized than most of what I used to receive in outlook. Most of that wasn't hyper-sophisticated cap table analyses. It was analysis from a Jr Analyst or line employee trying to combine a few different data sources to get some signal on how XYZ function of the business was performing. AI automation is perfectly suitable for this.
Neat formatting didn't save any model from having the wrong formula pasted in.
Being neat was never a substitute for being well rested, or sufficiently caffeinated.
Have you seen how AI functions in the hands of someone who isn't a domain expert? I've used it for things I had no idea about, like Astro+ web dev. User ignorance was magnified spectacularly.
This is going to have Jr Analysts dumping well formatted junk in email boxes within a month.
It's actually really cool. I will say that "spreadsheets" remain a bandaid over dysfunctional UIs, processes, etc and engineering spends a lot of time enabling these bandaids vs someone just saying "I need to see number X" and not "a BI analytics data in a realtime spreadsheet!", etc.
Some people - normal people - understand the difference between the holistic experience of a mathematically informed opinion and an actual model.
It's just that normal people always wanted the holistic experience of an answer. Hardly anyone wants a right answer. They have an answer in their heads, and they want a defensible journey to that answer. That is the purpose of Excel in 95% of places it is used.
Lately people have been calling this "syncophancy." This was always the problem. Sycophancy is the product.
It seems like to me the answer is moreso "People on HN are so far removed from the real use cases for this kind of automation they simply have no idea what they're talking about".
Seems everyone is speculating features instead of just reading TFA which does in fact list features:
- Get answers about any cell in seconds:
Navigate complex models instantly. Ask Claude about specific formulas, entire worksheets, or calculation flows across tabs. Every explanation includes cell-level citations so you can verify the logic.
- Test scenarios without breaking formulas:
Update assumptions across your entire model while preserving all dependencies. Test different scenarios quickly—Claude highlights every change with explanations for full transparency.
- Debug and fix errors:
Trace #REF!, #VALUE!, and circular reference errors to their source in seconds. Claude explains what went wrong and how to fix it without disrupting the rest of your model.
- Build models or fill existing templates:
Create draft financial models from scratch based on your requirements. Or populate existing templates with fresh data while maintaining all formulas and structure.
Also people complaining about AI inaccuracy are just technical people that like precision. The vast majority of the world is people who dont give a damn about accuracy or even correctness. They just want to appear as if not completely useless to people that could potentially affect their salary
They can try, but doubt anyone serious will adopt it.
Tried integrating chatgpt into my finance job to see how far I can get. Mega jikes...millions of dollars of hallucinated mistakes.
Worse you don't have the same tight feedback loop you've got in programming that'll tell you when something is wrong. Compile errors, unit tests etc. You basically need to walk through everything it did to figure out what's real and what's hallucinations. Basically fails silently. If they roll that out at scale in the financial system...interesting times ahead.
Still presumably there is something around spreadsheets it'll be able to do - the spreadsheet equivalent of boilerplate code whatever that may be
I'm bad with spread sheets so maybe this is trivial but having an llm tell me how to connect my sheet to whatever data I'm using at the moment and it coming up with a link or sql query or both has allowed me to quickly pull in data where I'd normally eyeball it and move on or worst case do it partially manually if really important.
It's like one off scripts in a sense? I'm not doing complex formulas I just need to know how I can pull data into a sheet and then I'll bucketize or graph it myself.
Again probably because I'm not the most adept user but it has definitely been a positive use case for me.
From the signup form mentioning Private Equity / Venture Capital, Hedge Fund, Investment Banking... this seems squarely aimed at financial modeling. Which is really, really cool.
I've worked alongside sell-side investment bankers in a prior startup, and so much of the work is in taking a messy set of statements from a company, understanding the underlying assumptions, and building, and rebuilding, and rebuilding, 3-statement models that not only adhere to standard conventions (perhaps best introed by https://www.wallstreetprep.com/knowledge/build-integrated-3-... ) but also are highly customized for different assumptions that can range from seasonality to sensitivity to creative deal structures.
It is quite common for people to pull many, many all-nighters to try to tweak these models in response to a senior banker or a client having an idea! And one might argue there are way too many similar-looking numbers to keep a human banker from "hallucinating," much less an LLM.
But fundamentally, a 3-statement model and all its build-sheets are a dependency graph with loosely connected human-readable labels, and that means you can write tools that let an LLM crawl that dependency graph in a reliable and semantically meaningful way. And that lets you build really cool things, really fast.
I'm of the opinion that giving small companies the ability to present their finances to investors, the same way Fortune 500 companies hire armies of bankers to do, is vital to a healthy economy, and to giving Main Street the best possible chance to succeed and grow. This is a massive step in the right direction.
Presenting false data to investors is fraud, doesn't matter how it was generated. In fact, humans are quite good at "generating plausible looking data", doesn't mean human generated spreadsheets are fraud.
On the other hand, presenting truthful data to investors is distinctly not fraud, and this again does not depend on the generation method.
If humans "generate plausible looking data" despite any processes to ensure data quality they've likely engaged in willful fraud.
An LLM doing so needn't even be willful from the author's part. We're going to see issues with forecasts/slide decks full of inaccuracies that are hard to review.
Anthropic is in a weird place for me right now. They're growing fast , creating little projects that i'd love to try, but their customer service was so bad for me as a max subscriber that I set an ethical boundary for myself to avoid their services until such point that it appears that they care about their customers whatsoever.
I keep searching for a sign, but everyone I talk to has horror stories. It sucks as a technologist that just wants to play with the thing; oh well.
> I keep searching for a sign, but everyone I talk to has horror stories. It sucks as a technologist that just wants to play with the thing; oh well.
The reason that Claude Code doesn't have an IDE is because ~"we think the IDE will obsolete in a year, so it seemed like a waste of time to create one."
Noam Shazeer said on a Dwarkesh podcast that he stopped cleaning his garage, because a robot will be able to do it very soon.
If you are operating under the beliefs these folks have, then things like IDEs, cleaning up, and customer service are stupid annoyances that will become obsolete very soon.
To be clear, I have huge respect for everyone mentioned above, especially Noam.
I would do it for free, just to answer the question of what does a genius of his caliber have in his garage? Probably the same stuff most people do, but it would still be interesting.
I don’t think the point was about having a clean space, it was in response to a question along the lines of: when do you think we will achieve AGI?
bad customer service comes from low priority.
I think anthropic prioritize new growth point over small number of customer’s feedback, that’s why they publish new product, features so frequently, there are so much possible potential opportunities for them to focus
Best way to think of it is this: Right now you are not the customer. Investors are.
The money people pay in monthly fees to Anthropic for even the top Max sub likely doesn't come closer to covering the energy & infrastructure costs for running the system.
You can prove this to yourself by just trying to cost out what it takes to build the hardware capable of running a model of this size at this speed and running it locally. It's tens of thousands of dollars just to build the hardware, not even considering the energy bills.
So I imagine the goal right now is to pull in a mass audience and prove the model, to get people hooked, to get management and talent at software firms pushing these tools.
And I guess there's some in management and the investment community that thinks this will come with huge labour cost reductions but I think they may be dreaming.
... And then.. I guess... jack the price up? Or wait for Moore's Law?
So it's not a surprise to me they're not jumping to try and service individual subscribers who are paying probably a fraction of what it costs them to the run the service.
I dunno, I got sick of paying the price for Max and I now use the Claude Code tool but redirect it to DeepSeek's API and use their (inferior but still tolerable) model via API. It's probably 1/4 the cost for about 3/4 the product. It's actually amazing how much of the intelligence is built into the tool itself instead of just the model. It's often incredibly hard to tell the difference bertween DeepSeek output and what I got from Sonnet 4 or Sonnet 4.5
I've been playing around with local LLMs in Ollama, just for fun. I have an RTX 4080 Super, a Ryzen 5950X with 32 threads, and 64 GB of system memory. A very good computer, but decidedly consumer-level hardware.
I have primarily been using the 120b gpt-oss model. It's definitely worse than Claude and GPT-5, but not by, like, an order of magnitude or anything. It's also clearly better than ChatGPT was when it first came out. Text generates a bit slowly, but it's perfectly usable.
So it doesn't seem so unreasonable to me that costs could come down in a few years?
Every AI company right now (except Google Meta and Microsoft) has their valuations based on the expectation of a future monopoly on AGI. None of their business models today or in the foreseeable horizon are even positive let alone world-dominating. The continued funding rounds are all apparently based on expectation of becoming the sole player.
The continuing advancement of open source / open weights models keeps me from being a believer.
So cool, I hope they pull it off. So many people use Excel. Although, I always thought the power of AI in Excel would come from the ability to use AI _as_ a formula. For example, =PROMPT("Classify user feedback as positive, neutral or negative", A1). This would enable normal people (non-programmers) to fire off thousands of prompts at once and automate workflows like programmers do (disclaimer: I am the author of Cellm that does exactly this). Combined with Excel's built-in functions for deterministic work, Claude could really kill the whole copy-pasting data in and out of chat windows for bulk-processing data.
I'm a co-founder of Calcapp, an app builder for formula-driven apps using Excel-like formulas. I spent a couple of days using Claude Code to build 20 new templates for us, and I was blown away. It was able to one-shot most apps, generating competent, intricate apps from having looked at a sample JSON file I put together. I briefly told it about extensions we had made to Excel functions (including lambdas for FILTER, named sort type enums for XMATCH, etc), and it picked those up immediately.
At one point, it generated a verbose formula and mentioned, off-handedly, that it would have been prettier had Calcapp supported LET. "It does!", I replied, "and as an extension, you can use := instead of , to separate names and values!") and it promptly rewrote it using our extended syntax, producing a sleek formula.
These templates were for various verticals, like real estate, financial planning and retail, and I would have been hard-pressed to produce them without Claude's domain knowledge. And I did it in a weekend! Well, "we" did it in a weekend.
So this development doesn't really surprise me. I'm sure that Claude will be right at home in Excel, and I have already thought about how great it would be if Claude Code found a permanent home in our app designer. I'm concerned about the cost, though, so I'm holding off for now. But it does seem unfair that I get to use Claude to write apps with Calcapp, while our customers don't get that privilege.
On the first glance this seems to be a very bad idea. But re-readig this:
> Get answers about any cell in seconds: Navigate complex models instantly. Ask Claude about specific formulas, entire worksheets, or calculation flows across tabs. Every explanation includes cell-level citations so you can verify the logic.
this might just be an excellent tool for refactoring Excel sheets into something more robust and maintainable. And making a bunch of suits redundant.
This is going to be massive if it works as well as I suspect it might.
I think many software engineers overlook how many companies have huge (billion dollar) processes run through Excel.
It's much less about 'greenfield' new excel sheets and much more about fixing/improving existing ones. If it works as well as Claude Code works for code, then it will get pretty crazy adoption I suspect (unless Microsoft beats them to it).
> I think many software engineers overlook how many companies have huge (billion dollar) processes run through Excel.
So they can fire the two dudes that take care of it, lose 15 years of in house knowledge to save 200k a year and cry in a few months when their magic tool shits the bed ?
If the company is half baked, those "two dudes" will become indispensable beyond belief. They are the ones that understand how Excel works far deeper, and paired with Claude for Excel they become far far more valuable.
At my org it more that these AI tools finally allow the employees to get through things at all. The deadlines are getting met for the first time, maybe ever. We can at last get to the projects that will make the company money instead of chasing ghosts from 2021. The burn down charts are warm now.
The thing really missing from multi-megabyte excel sheets of business critical carnage was a non-deterministic rewrite tool. It'll interact excitingly with the industry standard of no automated testing whatsoever.
I 100% believe generative AI can change a spreadsheet. Turn the xslx into text, mutate that, turn it back into an xslx, throw it away if it didn't parse at all. The result will look pretty similar to the original too, since spreadsheets are great at showing immediately local context and nothing else.
Also, we've done a pretty good job of training people that chatgpt works great, so there's good reason for them to expect claude for excel to work great too.
I'd really like the results of this to be considered negligence with non-survivable fines for the reckless stupidity, but more likely, it'll be seen as an act of god. Like all the other broken shit in the IT world.
I wonder if this will be more/less useful than what we have with AI in software development.
There's a lot less to understand than a whole codebase.
I don't do spreadsheets very often, but I can emphasize with tracking down "Trace #REF!, #VALUE!, and circular reference errors to their source in seconds." I once hit something like that, and I found it a lot harder to trace a typical compiler error.
I'm not excited about having LLMs generate spreadsheets or formulas. But, I think LLMs could be particularly useful in helping me find inconsistent formulas or errors that are challenging to identify. Especially in larger, complex spreadsheets touched by multiple people over the course of months.
For once in my life, I actually had a delightful interaction with an LLM last week. I was changing some text in an Excel sheet in a very progromatic way that could have easily been done with the regex functions in Excel. But I'm not really great with regex, and it was only 15 or so cells, so I was content to just do it manually. After three or four cells, Copilot figured out what I was doing and suggested the rest of the changes for me.
This is what I want AI to do, not generate wrong answers and hallucinate girlfriends.
One approach is to produce read-only data in BI tools: users are free to export anything they want and make their own spreadsheets, but those are for their own use only. Reference data is produced every day by a central, controlled process and cannot in any circumstance be modified by the end user.
I have implemented this a couple of times and not only does it work well, it tends to be fairly well accepted. People need spreadsheets to work on them, but generally they kind of hate sending those around via email. Having a reference source of data is welcomed.
IMO, a real solution here has to be hybrid, not full LLM, because these sheets can be massive and have very complicated structures. You want to be able to use the LLM to identify / map column headers, while using non-LLM tool calling to run Excel operations like SUMIFs or VLOOKUPs. One of the most important traits in these systems is consistency with slight variation in file layout, as so much Excel work involves consolidating / reconciling between reports made on a quarterly basis or produced by a variety of sources, with different reporting structures.
Disclosure: My company builds ingestion pipelines for large multi-tab Excel files, PDFs, and CSVs.
"Hey Claude, what's the total from this file?
> grep for headers
> "Ah, I see column 3 is the price value"
> SUM(C3:C) -> $2020
> "Great! I found your total!"
If you can find me an example of tech that can solve this at scale on large, diverse Excel formats, then I'll concede, but I haven't found something actually trustworthy for important data sets
As I was reading through the post, and the comments here, and pondering my own many hours with these tools, I was suddenly reminded of one of my favorite studio C sketches: An Unfortunate Fortune
I would love to learn more about their challenges as I have been working on an Excel AI add-in for quite some time and have followed Ask Rosie from almost their start.
That they now gone through the whole cycle worries me I‘m too slow as a solo building on the side in these fast paced times.
That seems to be true for any startup that offers a wrapper to existing AIs rather than an AI on their own. The lucky ones might be bought but many if not most of them will perish trying to compete with companies that actually create AI models and companies large enough to integrate their own wrappers.
Interesting their X post mentions "pre-built Agent Skills" but it's not on the webpage. I wonder if they will give you the ability to edit/add/delete Skills, that would be phenomenal.
Modern Excel add-ins work in desktop Windows, macOS, and web. They're just a bit of XML that Excel looks at to call a whatever web endpoint is defined in the XML.
I guess Claude maybe useful for finding errors in large Excel Workbooks.
May also help beginners to learn the more complex Excel functions (which are still pretty easy).
But if you are proficient at building Excel models I don't see any benefit.
Excel already has a superb very efficient UI for entering formulas, ranges, tables, data sources etc I'm sceptical that a different UI especially a text based one can improve on this.
I understand the sentiment about a skilled user not needing this, but I think having a little buddy that I can use to offload some menial tasks would be helpful for me to iterate through my models more efficiently; even if the AI is not perfect. As a highly skilled excel user, I admit the software has terrible ergonomics. It would be a productivity boon for me if an AI can help me stay focused on model design vs model implementation.
For some reason, I find that these tools are TERRIBLE at helping someone learn. I suspect because turning one on, results in turning the problem solving part of ones brain off.
Its obviously not the same experience for everyone. ( If you are one of those energized while working in a chat window, you might be in a minority - given what we see from the ongoing massacre of brains in education. )
Paraphrasing something I read here "people don't use ChatGPT to do learn more, they use it to study less".
Copilot is getting better - I'm getting fewer of those than I used to - but it's still significantly more stupid than other agents, even when in theory it's using the same model.
Gemini already has its hooks in Google Sheets, and to be honest, I've found it very helpful in constructing semi-complicated Excel formulas.
Being able to select a few rows and then use plain language to describe what I want done is a time saver, even though I could probably muddle through the formulas if I needed to.
Last time I tried using Gemini in Google Sheets it hallucinated a bunch of fake data, then gave me a summary that included all that fake data. I'd given it a bunch of transaction data, and asked it to group the records into different categories for budgeting. When asking it to give the largest values in each category, all the values that came back were fake. I'm not sure I'd really trust it to touch a spreadsheet after that.
-stop using the free plan
-don't use gemini flash for these tasks
-learn how to do things over time and know that all ai models have improved significantly every few months
It is an entire agent loop. You can ask it to build a multi sheet analysis of your favorite stock and it will. We are seeing a lot of early adopters use it for financial modeling, research automation, and internal reporting tasks that used to take hours.
I forgot to add, you can try TabTabTab, without installing anything as well.
To see something much more powerful on Google Sheets than Gemini for free, you can add "try@tabtabtab.ai" to your sheet, and make a comment tagging "try@tabtabtab.ai" and see it in action.
I have had the opposite experience. I've never had Gemini give me something useful in sheets, and I'm not asking for complicated things. Like "group this data by day" or "give me p50 and p90"
I have just launched a product (easyanalytica.com) to create dashboards from spreadsheets, and Excel is on my to-do list of formats to be supported. However, I'm having second thoughts. Although, from the description, it seems like it would be more helpful on the modeling side rather than the presentation side. I guess I'll have to wait until it's publicly available
If it has a concept of data sources and can digest them, sure.
Anecdotally, most issues with Excel at my job are caused by data sources being renamed, moved or reformatted, by broken logins, or by insufficient access rights.
George Hotz said there's 5 tiers of AI systems, Tier 1 - Data centers, Tier 2 - fabs, Tier 3 - chip makers, Tier 4 - frontier labs, Tier 5 - Model wrappers. He said Tier 4 is going to eat all the value of Tier 5, and that Tier 5 is worthless. It's looking like that's going to be the case
That is a common refrain by people who have no domain expertise in anything outside of tech.
Spend a few years in an insurance company, a manufacturing plant, or a hospital, and then the assertion that the frontier labs will figure it out appears patently absurd. (After all, it takes humans years to understand just a part of these institutions, and they have good-functioning memory.)
This belief that tier 5 is useless is itself a tell of a vulnerability: the LLMs are advancing fastest in domain-expertise-free generalized technical knowledge; if you have no domain expertise outside of tech, you are most vulnerable to their march of capability, and it is those with domain expertise who will rely increasingly less on those who have nothing to offer but generalized technical knowledge.
yeah but if Anthropic/OpenAI dedicate resources to gaining domain expertise then any tier 5 is dead in the water. For example, they recently hired a bunch of finance professionals to make specialized models for financial modeling. Any startup in that space will be wiped out
I dont think the claim is exactly that tier 5 is useless more that tier 5 synergizes so well with tier 4 that all the popular tier 5 products will eventually be made by the tier 4 companies.
That OpenAI is now apparantly striving to become the next big app layer company could hint at George Hotz being right but only if the bets work out. I‘m glad that there is competition on the frontier labs tier.
Tier 5 requires domain expertise until we reach AGI or something very different from the latest LLMs.
I don’t think the frontier labs have the bandwidth or domain knowledge (or dare I say skills) to do tier 5 tasks well. Even their chat UIs leave a lot to be desired and that should be their core competency.
George Hotz says a lot of things. I think he's directionally correct but you could apply this argument to tech as a whole. Even outside of AI, there are plenty of niches where domain-specific solutions matter quite a bit but are too small for the big players to focus on.
Interesting. I found a reference to this in a tweet [1], and it looks to be a podcast. While I'm not extremely knowledgable. I'd put it like this: Tier 1 - fabs, Tier 2 - chip makers, Tier 3 - data centers, Tier 4 - frontier labs, Tier 5 - Model wrappers
However I would think more of elite data centers rather than commodity data centers. That's because I see Tier 4 being deeply involved in their data centers and thinking of buying the chips to feed their data centers. I wouldn't be so inclined to throw in my opinion immediately if I found an article showing this ordering of the tiers, but being a tweet of a podcast it might have just been a rough draft.
Been working with Claude Code lately and been pretty impressed.
If this works as well could be a nice add on. Its probably a smart market to enter as Excel is essentially everywhere.
Just like Claude Code allows 1 dev to potentially do the work of 2 or 3, I could see this allowing 1 accountant or operations person to do the work of 2 or 3. Financial savings but human cost
It’s interesting to me that this page talks a lot about “debugging models” etc. I would’ve expected (from the title) this to be going after the average excel user, similar to how chatgpt went after every day people.
I would’ve expected “make a vlookup or pivot table that tells me x” or “make this data look good for a slide deck” to be easier problems to solve.
The issue is that the average Excel user doesn’t quite have the skills to validate and double-check the Excel formulas that Claude would produce, and to correct them if needed. It would be similar to a non-programmer vibe-coding an app. And that’s really not what you want to happen for professionally used Excel sheets.
IMO that is exactly what people want. At my work everyone uses LLMs constantly and the trade off of not perfect information is known. People double check it, etc, but the information search is so much faster even if it finds the right confluence but misquotes it, it still sends me the link.
For easy spreadsheet stuff (which 80% of average white collars workers are doing when using excel) I’d imagine the same approach. Try to do what I want, and even if you’re half wrong the good 50% is still worth it and a better starting point.
Vibe coding an app is like vibe coding a “model in excel”. Sure you could try, but most people just need to vibe code a pivot table
I think actually Anthropic themselves are having trouble with imagining how this could be used. Coders think like coders - they are imagining the primary use case being managing large Excel sheets that are like big programs. In reality most Excel worksheets are more like tiny, one-off programs. More like scripts than applications. AI is very very good at scripts.
Yeah now tell the Auditors that the financial spreadsheet we have here has AI touching it left and right. "I did not cook the books I promise it is the AI that made our financials seem better than they actually are trust me bro!", said Joe from Accounting.
I'm excited to see what national disasters will be caused by auto-generated Excel sheets that nobody on the planet understands. A few selections from past HN threads to prime your imagination:
Especially combined with the dynamic array formulas that have recently been added (LET, LAMBDA etc). You can have much more going on within each cell now. Think whole temporary data structures. The "evaluate formula" dialog doesn't quite cut it anymore for debugging.
from my experience in the corporate world, i'd trust an excel generated / checked by an LLM more than i would one that has been organically grown over years in a big corporation where nobody ever checks or even can check anything because its one big growing pile of technical debt people just accept as working
How well does change tracking work in Excel... how hard would it be to review LLM changes?
AFAIK there is no 'git for Excel to diff and undo', especially not built-in (aka 'for free' both cost-wise and add-ons/macros not allowed security-wise).
My limited experience has been that it is difficult to keep LLMs from changing random things besides what they're asked to change, which could cause big problems if unattackable in Excel.
I thought there was track changes on all office products. Most Office documents are zip files of XML files and assets, so I'd imagine it would be possible to rollback changes.
When I think how easy I can misclick to stuff up a spreadsheet I can't begin to imagine all the subtle ways LLMs will screw them up.
Unlike code where it's all on display, with all these formulas are hidden in each cell, you won't see the problem unless click on the cell so you'll have a hard time finding the cause.
As an inveterate Excel lover, I can just sense the blinding pain wafting off the legions of accountants, associates, seniors, and tech people who keep the machine spirits placated.
lies, damn lies, statistics, and then Excel deciding cell data types.
If this works very well and reliable, it might not kill programming as such, but it might put a lot of small businesses who do custom software for other small businesses out of work.
If AI turns out to be the powerhouse it is claimed to be, AI's impact will be corporations replacing corporate dependencies upon 'Excel projects' created by self-taught assistants to department managers.
Cool but now companies POs will be like "you must add the Excel export for all the user data!" and when asked why, will basically be "so I can do this roundabout query of data for some number in a spreadsheet using AI (instead of just putting the number or chart directly in the product with a simple db call)"
Yet more evidence of the bubble burst being imminent. If any of these companies really had some almost-AGI system internally, they wouldn’t be spending any effort making f’ing Excel plugins. Or at the very least, they’d be writing their own Excel because AI is so amazing at coding, right?
You make a great point. Where is all of the complex applications? They haven't been able to create than own office suite or word processor or really anything aside from a halloween matching game in js. You would think we would have some complex application they can point to but nothing.
The current valuations do not require AGI. They require products like this that will replace scores of people doing computer based grunt work. MSFT is worth $4 trillion off the back of enterprise productivity software, the AI labs just need some of that money.
Yes. I once interviewed a developer who’s previous job was maintaining the .NET application that used an Excel sheet as the brain for decisions about where to drill for oil on the sea floor. No one understood what was in the Excel sheet. It was built by a geologist who was long gone. The engineering team understood the inputs and outputs. That’s all they needed to know.
Years ago when I worked for an engineering consulting company we had to work with a similarly complex, opaque Excel spreadsheet from General Electric modeling the operation of a nuclear power plant in exacting detail.
Same deal there -- the original author was a genius and was the only person who knew how it was set up or how it worked.
I think you’re misunderstanding me. This might be something somewhat useful, I don’t know, and I’m not judging it based on that.
What I’m saying is that if you really believed we were 2, maybe 3 years tops from AGI or the singularity or whatever you would spend 0 effort serving what already seems to be a domain that is already served by 3rd parties that are already using your models! An excel wrapper for an LLM isn’t exactly cutting edge AI research.
They’re desperate to find something that someone will pay a meaningful amount of money for that even remotely justifies their valuation and continued investment.
Okay. But then you could say the same for a human, isn't your brain just a cloud of matter and electricity that just reacts to senses deterministically?
> isn't your brain just a cloud of matter and electricity that just reacts to senses deterministically?
LLMs are not deterministic.
I'd argue over the short term humans are more deterministic. I ask a human the same question multiple times and I get the same answer. I ask an LLM and each answer could be very different depending on its "temperature".
I mean - try clicking the CoPilot button and see what it can actually do. Last I checked, it told me it couldn't change any of the actual data itself, but it could give you suggestions. Low bar for excellence here.
What is with the negativity in these comments? This is a huge, huge surface area that touches a large percentage of white collar work. Even just basic automation/scaffolding of spreadsheets would be a big productivity boost for many employees.
My wife works in insurance operations - everyone she manages from the top down lives in Excel. For line employees a large percentage of their job is something like "Look at this internal system, export the data to excel, combine it with some other internal system, do some basic interpretation, verify it, make a recommendation". Computer Use + Excel Use isn't there yet...but these jobs are going to be the first on the chopping block as these integrations mature. No offense to these people but Sonnet 4.5 is already at the level where it would be able to replicate or beat the level of analysis they typically provide.
Having wrangled many spreadsheets personally, and worked with CFOs who use them to run small-ish businesses, and all the way up to one of top 3 brokerage houses world-wide using them to model complex fixed income instruments... this is a disaster waiting to happen.
Spreadsheet UI is already a nightmare. The formula editing and relationship visioning is not there at all. Mistakes are rampant in spreadsheets, even my own carefully curated ones.
Claude is not going to improve this. It is going to make it far, far worse with subtle and not so subtle hallucinations happening left and right.
The key is really this - all LLMs that I know of rely on entropy and randomness to emulate human creativity. This works pretty well for pretty pictures and creating fan fiction or emulating someone's voice.
It is not a basis for getting correct spreadsheets that show what you want to show. I don't want my spreadsheet correctness to start from a random seed. I want it to spring from first principles.
My first job out of uni was building a spreadsheet infra as code version control system after a Windows update made an eight year old spreadsheet go haywire and lose $10m in a afternoon.
Spreadsheets are already a disaster.
> Mistakes are rampant in spreadsheets
To me, the case for LLMs is strongest not because LLMs are so unusually accurate and awesome, but because if human performance were put on trial in aggregate, it would be found wanting.
Humans already do a mediocre job of spreadsheets, so I don't think it is a given that Claude will make more mistakes than humans do.
HN has a base of strong anti-AI bias, I assume is partially motivated by insecurity over being replaced, losing their jobs or having missed the boat on the AI.
Based on the comments here, it's surprisingly anything in society works at all. I didn't realize the bar was "everything perfect every time, perfectly flexible and adaptable". What a joy some of these folks must be to work with, answering every new technology with endless reasons why it's worthless and will never work.
HN has an obsession with quality too, which has merit, but is often economically irrelevant.
When US-East-1 failed, lots of people talked about how the lesson was cloud agnosticism and multi cloud architecture. The practical economic lesson for most is that if US-East-1 fails, nobody will get mad at you. Cloud failure is viewed as an act of god.
I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.
It's one thing to fudge the language in a report summary, it can be subjective, however numbers are not subjective. It's widely known LLMs are terrible at even basic maths.
Even Google's own AI summary admits it which I was surprised at, marketing won't be happy.
Yes, it is true that LLMs are often bad at math because they don't "understand" it as a logical system but rather process it as text, relying on pattern recognition from their training data.
Seems like you're very confused about what this work typically entails. The job of these employees is not mental arithmatic. It's closer to:
- Log in to the internal system that handles customer policies
- Find all policies that were bound in the last 30 days
- Log in to the internal system that manages customer payments
- Verify that for all policies bound, there exists a corresponding payment that roughly matches the premium.
- Flag any divergences above X% for accounting/finance to follow up on.
Practically this involves munging a few CSVs, maybe typing in a few things, setting up some XLOOKUPs, IF formulas, conditional formatting, etc.
Will AI replace the entire job? No...but that's not the goal. Does it have to be perfect? Also no...the existing employees performing this work are also not perfect, and in fact sometimes their accuracy is quite poor.
> “Does it have to be perfect?”
Actually, yes. This kind of management reporting is either (1) going to end up in the books and records of the company - big trouble if things have to be restated in the future or (2) support important decisions by leadership — who will be very much less than happy if analysis turns out to have been wrong.
A lot of what ties up the time of business analysts is ticking and tying everything to ensure that mistakes are not made and that analytics and interpretations are consistent from one period to the next. The math and queries are simple - the details and correctness are hard.
Speak for yourself and your own use cases. There are a huge diversity of workflows with which to apply automation in any medium to large business. They all have differing needs. Many excel workflows I'm personally familiar with already incoporate a "human review" step. Telling a business leader that they can now jump straight to that step, even if it requires 2x human review, with AI doing all of the most tediuous and low-stakes prework, is a clear win.
There is another aspect to this kind of activity.
Sometimes there can be an advantage in leading or lagging some aspects of internal accounting data for a time period. Basically sitting on credits or debits to some accounts for a period of weeks. The tacit knowledge to know when to sit on a transaction and when to action it is generally not written down in formal terms.
I'm not sure how these shenanigans will translate into an ai driven system.
That’s the kind of thing that can get a company into a lot of trouble with its auditors and shareholders. Not that I am offering accounting advice of course. And yeah, one can not “blame” and ai system or try to ai-wash any dodgy practices.
Checking someone elses spreadsheet is a fucking nightmare. If your company has extremely good standards it's less miserable because at least the formatting etc will be consistent...
The one thing LLMs should consistently do is ensure that formatting is correct. Which will help greatly in the checking process. But no, I generally don't trust them to do sensible things with basic formulation. Not a week ago GPT 5 got confused whether a plus or a minus was necessary in a basic question of "I'm 323 days old, when is my birthday?"
I think you have a misunderstanding of the types of things that LLMs are good at. Yes you're 100% right that they can't do math. Yet they're quite proficient at basic coding. Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.
My concern would be more with how to check the work (ie, make sure that the formulas are correct and no columns are missed) because Excel hides all that. Unlike code, there's no easy way to generate the diff of a spreadsheet or rely on Git history. But that's different from the concerns that you have.
> Yes you're 100% right that they can't do math.
The model ought to be calling out to some sort of tool to do the math—effectively writing code, which it can do. I'm surprised the major LLM frontends aren't always doing this by now.
So do it in basic code where numbering your line G53 instead of G$53 doesn't crash a mass transit network because somebody's algorithm forgot to order enough fuel this month.
I've built spreadsheet diff tools on Google sheets multiple times. As the needs grows I think we will see diffs and commits and review tools reach customers
hey Collin! I am working on an AI agent on Google Sheets, I am curious if any of your designs are out in the public. We are trying to re-think how diffs should look like and want to make something nicer than what we currently have, so curious.
proficient != near-flawless.
> Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.
This is a hot take. One I'm not sure many would agree with.
Maybe LLMs will enable a new type of work in spreadsheets. Just like in coding we have PR reviews, with an LLM it should be possible to do a spreadsheet review. Ask the LLM to try to understand the intent and point out places where the spreadsheet deviates from the intent. Also ask the LLM to narrate the spreadsheet so it can be understood.
That first condition "try to understand the intent" is where it could go wrong. Maybe it thinks the spreadsheet aligns with the intent, but it misunderstood the intent.
LLMs are a lossy validation, and while they work sometimes, when they fail they usually do so 'silently'.
Maybe we need some kind of method, framework to develop intent. Most of things that go wrong in knowledge working are down to lack of common understanding of intent.
> The one thing LLMs should consistently do is ensure that formatting is correct.
In JavaScript (and I assume most other programming languages) this is the job of static analysis tools (like eslint, prettier, typescript, etc.). I’m not aware of any LLM based tools which performs static analysis with as good a results as the traditional tools. Is static analysis not a thing in the spreadsheet world? Are there the tools which do static analysis on spreadsheets subpar, or offer some disadvantage not seen in other programming languages? And if so, are LLMs any better?
Just use a normal static analysis tool and shove the result to an LLM. I believe Anthropic properly figured that agents are the key, in addition to models, contrary to OpenAI that is run by a psycho that only believes in training the bigger model.
[dead]
Sysadmin of a small company. I get asked pretty often to help with a pivot table, vlookup, or just general excel functions (and smartsheet, these users LOVE smartsheet)
Indeed, in a small enough org, the sysadmin/technologist becomes support of last resort for all the things.
> these users LOVE smartsheet
I hate smartsheet…
Excel or R. (Or more often, regex followed by pen and paper followed by more regex.)
Last time, I gave claude an invoice and asked it to change one item on it, it did so nicely and gave me the new invoice. Good thing I noticed it had also changed the bank account number..
The more complicated the spreadsheet and the more dependencies it has, the greater the room for error. These are probabilistic machines. You can use them, I use them all the time for different things, but you need to treat them like employees you can't even trust to copy a bank account number correctly.
We’ve tried to gently use them to automate some of our report generation and PDF->Invoice workflows and it’s a nightmare of silent changes and absence of logic.. basic things like specifically telling it “debits need to match credits” and “balance sheets need to balance” that are ignored.
Yeah, asking llm to edit one specific thing in a large or complex document/ codebase is like those repeated "give me the exact same image" gifs. It's fundamentally a statistical model so the only thing we can be _certain_ of is that _it's not_. It might get the desired change 100% correct but it's only gonna get the entire document 99 5%
The use cases for spreadsheets are much more diverse than that. In my experience, spreadsheets just as often used for calculation. Many of them do require high accuracy, rely on determinism, and necessitate the understanding of maths ranging from basic arithmetic to statistics and engineering formulas. Financial models, for example, must be built up from ground truth and need to always use the right formulas with the right inputs to generate meaningful outputs.
I have personally worked with spreadsheet based financial models that use 100k+ rows x dozens of columns and involve 1000s of formulas that transform those data into the desired outputs. There was very little tolerance for mistakes.
That said, humans, working in these use cases, make mistakes >0% of the time. The question I often have with the incorporation of AI into human workflows is, will we eventually come to accept a certain level of error from them in the way we do for humans?
Do you trust humans to be precise and deterministic, or even to be especially good at math?
This is talking about applying LLMs to formula creation and references, which they are actually pretty good at. Definitely not about replacing the spreadsheet's calculation engine.
Most real-world spreadsheets I've worked with were fragile and sloppy, not precise and deterministic. Programmers always get shocked when they realize how many important things are built on extremely messy spreadsheets, and that people simply accept it. They rather just spend human hours correcting discrepancies than trying to build something maintainable.
LLMs are just a tool, though. Humans still have to verify them, like with very other tools out there
Eh, yes. In theory. In practice, and this is what I have experienced personally, bosses seem to think that you now have interns so you should be able to do 5x the output.. guess what that means. No verification or rubber stamp.
I don't trust humans to do the kind of precise deterministic work you need in a spreadsheet!
Right, we shouldn’t use humans or LLMs. We should use regular deterministic computer programs.
For cases where that is not available, we should use a human and never an LLM.
I like to use Claude Code to write deterministic computer programs for me, which then perform the actual work. It saves a lot of time.
I had a big backlog of "nice to have scripts" I wanted to write for years, but couldn't find the time and energy for. A couple of months after I started using Claude Code, most of them exist.
That’s great and the only legitimate use case here. I suspect Microsoft will not try to limit customers to just writing scripts and will instead allow and perhaps even encourage them to let the AI go ham on a bunch of raw data with no intermediary code that could be reviewed.
Just a suspicion.
"regular deterministic computer programs" - otherwise known as the SUM function in Microsoft Excel
[dead]
> I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.
I was thinking along the same lines, but I could not articulate as well as you did.
Spreadsheet work is deterministic; LLM output is probabilistic. The two should be distinguished.
Still, its a productivity boost, which is always good.
I don’t see the issue so much as the deterministic precision of an LLM, but the lack of observability of spreadsheets. Just looking at two different spreadsheets, it’s impossible to see what changes were made. It’s not like programming where you can run a `git diff` to see what changes an LLM agent made to a source code file. Or even a word processing document where the text changes are clear.
Spreadsheets work because the user sees the results of complex interconnected values and calculations. For the user, that complexity is hidden away and left in the background. The user just sees the results.
This would be a nightmare for most users to validate what changes an LLM made to a spreadsheet. There could be fundamental changes to a formula that could easily be hidden.
For me, that the concern with spreadsheets and LLMs - which is just as much a concern with spreadsheets themselves. Try collaborating with someone on a spreadsheet for modeling and you’ll know how frustrating it can be to try and figure out what changes were made.
"I don't trust LLMs to do the kind of precise deterministic work" => I think LLM is not doing the precise arithmetic. It is the agent with lots of knowledge (skills) and tools. Precise deterministic work is done by tools (deterministic code). Skills brings domain knowledge and how to sequence a task. Agent executes it. LLM predicts the next token.
Sure, but this isn't requiring that the LLM do any math. The LLM is writing formulas and code to do the math. They are very good at that. And like any automated system you need to review the work.
Exactly, and if it can be done in a way that helps users better understand their own spreadsheets (which are often extremely complex codebases in a single file!) then this could be a huge use case for Claude.
> I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.
Rightly so! But LLMs can still make you faster. Just don't expect too much from it.
They're not great at arithmetic but at abstract mathematics and numerical coding they're pretty good actually.
If LLMs can replace mathematica for me when I'm doing affine yield curve calculations they can do a DCF for some banker idiots
you might trust when the precision is extremely high and others agree with that.
high precision is possible because they can realize that by multiple cross validations
ChatGPT is actively being used as a calculator.
>I don't trust LLMs to do the kind of precise deterministic work
not just in a spreadsheet, any kind of deterministic work at all.
find me a reliable way around this. i don't think there is one. mcp/functions are a band aid and not consistent enough when precision is important.
after almost three years of using LLMs, i have not found a single case where i didn't have to review its output, which takes as long or longer than doing it by hand.
ML/AI is not my domain, so my knowledge is not deep nor technical. this is just my experience. do we need a new architecture to solve these problems?
ML/AI is not my domain but you don’t have to get all that technical to understand that LLMs run on probability. We need a new architecture to solve these problems.
I couldn’t agree more. I get all my perfectly deterministic work output from human beings!
If only we had created some device that could perform deterministic calculations and then wrote software that made it easy for humans to use such calculations.
ok but humans are idiots, if only we could make some sort of Alternate Idiot, a non-human but every bit as generally stupid as humans are! This A.I would be able to do every stupid thing humans did with the device that performed deterministic calculations only many times faster!
Yes and when the AI did that all the stupid humans could accept its output without question. This would save the humans a lot of work and thought and personal responsibility for any mistakes! See also Israel’s Lavender for an exciting example of this in action.
[dead]
It is bad in a very specific sense, but I did not see any other comments express the bad parts instead of focusing merely on the accuracy part ( which is an issue, but not the issue ):
- this opens up ridiculous flood of data that would otherwise be semi-private to one company providing this service - this works well small data sets, but will choke on ones it will need to divvy up into chunks inviting interesting ( and yet unknown ) errors
There is a real benefit to being able to 'talk to data', but anyone who has seen corporate culture up close and personal knows exactly where it will end.
edit: an i saying all this as as person, who actually likes llms.
The vast majority of people in business and science are using spreadsheets for complex algorithmic things they weren't really designed for, and we find a metric fuckton of errors in the sheets when you actually bother looking auditing them, mistakes which are not at all obvious without troubleshooting by... manually checking each and every cell & cell relation, peering through parentheses, following references. It's a nightmare to troubleshoot.
LLMs specialize in making up plausible things with a minimum of human effort, but their downside is that they're very good at making up plausible things which are covertly erroneous. It's a nightmare to troubleshoot.
There is already an abject inability to provision the labor to verify Excel reasoning when it's composed by humans.
I'm dead certain that Claude will be able to produce plausibly correct spreadsheets. How important is accuracy to you? How life-critical is the end result? What are your odds, with the current auditing workflow?
Okay! Now! Half of the users just got laid off because management thinks Claude is Good Enough. How about now?
LLMs are getting quite good at reviewing the results and implementations, though
Anthropic now has all your company's data, and all you saved was the cost of one human minus however much they charge for this. The good news is it can't have your data again! So starting from the 163rd-165th person you fire, you start to see a good return and all you've sacrificed is exactitude, precision, judgement, customer service and a little bit of public perception!
My concern is that my insurance company will reject a claim, or worse, because of something an LLM did to a spreadsheet.
Now, granted, that can also happen because Alex fat-fingered something in a cell, but that's something that's much easier to track down and reverse.
They already doing that with AI, rejecting claims at higher numbers than before .
Privatized insurance will always find a way to pay out less if they could get away with it . It is just nature of having the trifecta of profit motive , socialized risk and light regulation .
If you think that insurance companies have "light regulation", I shudder to think of what "heavy regulation" would look like. (Source: I'm the CTO at an insurance company.)
Light did not mean to imply quantity of paperwork you have to do, rather are you allowed to do the things you want to do as a company.
More compliance or reporting requirements usually tend to favor the larger existing players who can afford to do it and that is also used to make the life difficult and reject more claims for the end user.
It is kind of thing that keeps you and me busy, major investors don't care about it all, the cost of the compliance or the lack is not more than a rounding number in the balance, the fines or penalties are puny and laughable.
The enormous profits year on year for decades now, the amount of consolidation allowed in the industry show that the industry is able to do mostly what they want pretty much, that is what I meant by light regulation.
They have too much regulation, and too little auditing (at least in the managed healthcare business).
I agree, and I can see where it comes from (at least at the state level). The cycle is: bad trend happens that has deep root causes (let's say PE buying rural hospitals because of reduced Medicaid/Medicare reimbursements); legislators (rightfully) say "this shouldn't happen", but don't have the ability to address the deep root causes so they simply regulate healthcare M&As – now you have a bandaid on a problem that's going to pop up elsewhere.
I mean even in the simple stuff like denying payment for healthcare that should have been covered. CMS will come by and out a handful of cases, out of millions, every few years.
So obviously the company that prioritizes accuracy of coverage decisions by spending money on extra labor to audit itself is wasting money. Which means insureds have to waste more time getting the payment for healthcare they need.
> They already doing that with AI, rejecting claims at higher numbers than before
Source?
Haven't risk based models been a thing for the last 15-20 years ?
> It is just nature of having the trifecta of profit motive , socialized risk and light regulation.
It's the nature of everything. They agree to pay you for something. It's nothing specific to "profit motive" in the sense you mean it.
I should have been clearer - profit maximization above all else as long it is mostly legal. Neither profit or profit maximization at all cost is nature of everything .
There are many other entity types from unions[1], cooperatives , public sector companies , quasi government entities, PBC, non profits that all offer insurance and can occasionally do it well.
We even have some in the US and don’t think it is communism even - like the FDIC or things like social security/ unemployment insurance.
At some level government and taxation itself is nothing but insurance ? We agree to paying taxes to mitigate against variety of risks including foreign invasion or smaller things like getting robbed on the street.
[1] Historically worker collectives or unions self-organized to socialize the risks of both major work ending injuries or death.
Ancient to modern armies operate on because of this insurance the two ingredients that made them not mercenaries - a form of long term insurance benefit (education, pension, land etc) or family members in the event of death and sovereign immunity for their actions.
Couldn't they accomplish the same thing by rejecting a certain percentage of claims totally at random?
That would be illegal though, the goal is do this legally after all.
We also have to remember all claims aren't equal. i.e. some claims end up being way costlier than others. You can achieve similar % margin outcomes by putting a ton of friction like, preconditions, multiple appeals processes and prior authorization for prior authorization, reviews by administrative doctors who have no expertise in the field being reviewed don't have to disclose their identity and so and on.
While U.S. system is most extreme or evolved, it is not unique, it is what you get when you end up privatize insurance any country with private insurance has some lighter version of this and is on the same journey .
Not that public health system or insurance a la NHS in UK or like Germany work, they are underfunded, mismanaged with long times in months to see a specialist and so on.
We have to choose our poison - unless you are rich of course, then the U.S. system is by far the best, people travel to the U.S. to get the kind of care that is not possible anywhere else.
Why does saying "AI did it" make it legal, if the outcome is the same?
>>They already doing that with AI, rejecting claims at higher numbers than before .
That's a feature, not a bug.
This is a great application of this quote. Insurance providers have 0 incentive to make their AI "good" at processing claims, in fact it's easy to see how "bad" AI can lead to a justification to deny more claims.
Wait until a company has to restate earnings because of a bug in a Claudified Excel spreadsheet.
[dead]
Probably because many people here are software developers, and wrapping spreadsheets in deterministic logic and a consistent UI covers... most software use cases.
Yeah, this could be a pretty big deal. Not everyone is an excel expert, but nearly everyone finds themselves having to work with data in excel at some time or other.
It's like the negativity whenever a post talks about hiring or firing. A lot of people are afraid that they are going to lose their jobs to AI.
What does scaffolding of spreadsheets mean? I see the term scaffolding frequently in the context of AI-related articles and not familiar with this method and I’m hesitant to ask an LLM.
Scaffolding typically just refers to a larger state machine style control flow governing an agent's behavior and the suite of external tools it has access to.
I have to admit that my first thought was “April’s fool”. But you are right. It makes a lot of sense (if they can get it to work well). Not only is Excel the world’s biggest “programming language”. It’s probably also one of the most unintuitive ways to program.
If you exclude macros with IO it’s actually the most popular purely functional programming language (no quotes) on the planet by far.
Why unintuitive?
> but these jobs are going to be the first on the chopping block as these integrations mature.
Perhaps this is part of the negativity? This is a bad thing for the middle class.
in the short run. In the long run, productivity gains benefit* all of us (in a functional market economy).
*material benefit. In terms of spirit and purpose, the older I get the more I think maybe the Amish are on to something. Work gives our lives purpose, and the closer the work is to our core needs, the better it feels. Labor saving so that most of us are just entertaining each other on social networks may lead to a worse society (but hey, our material needs are met!)
agree with you, but it cannot be stopped. development of technology always makes wealth distribution more centralized
this will push the development of open source models.
people think of privacy at first regards of data, local deployment of open source models are the first choice for them
Whats with claiming negativity when most of the comments here are positive?
I think excel is a dead end. LLM agents will probably greatly prefer SQL, sqlite, and Python instead of bulky made-for-regular-folks excel.
Versatility and efficiency explode while human usability tanks, but who cares at that point?
Database might be the future, but viable solution on excel are evidence to prove that it works
I used to live in excel.
The issue isn’t in creating a new monstrosity in excel.
The issue is the poor SoB who has to spelunk through the damn thing to figure out what it does.
Excel is the sweet spot of just enough to be useful, capable enough to be extensible, yet gated enough to ensure everyone doesn’t auto run foreign macros (or whatever horror is more appropriate).
In the simplest terms - it’s not excel, it’s the business logic. If an excel file works, it’s because theres someone who “gets” it in the firm.
I used to live in Excel too. I've trudged through plenty of awful worksheets. The output I've seen from AI is actually more neatly organized than most of what I used to receive in outlook. Most of that wasn't hyper-sophisticated cap table analyses. It was analysis from a Jr Analyst or line employee trying to combine a few different data sources to get some signal on how XYZ function of the business was performing. AI automation is perfectly suitable for this.
How?
Neat formatting didn't save any model from having the wrong formula pasted in.
Being neat was never a substitute for being well rested, or sufficiently caffeinated.
Have you seen how AI functions in the hands of someone who isn't a domain expert? I've used it for things I had no idea about, like Astro+ web dev. User ignorance was magnified spectacularly.
This is going to have Jr Analysts dumping well formatted junk in email boxes within a month.
It's actually really cool. I will say that "spreadsheets" remain a bandaid over dysfunctional UIs, processes, etc and engineering spends a lot of time enabling these bandaids vs someone just saying "I need to see number X" and not "a BI analytics data in a realtime spreadsheet!", etc.
> What is with the negativity in these comments?
Some people - normal people - understand the difference between the holistic experience of a mathematically informed opinion and an actual model.
It's just that normal people always wanted the holistic experience of an answer. Hardly anyone wants a right answer. They have an answer in their heads, and they want a defensible journey to that answer. That is the purpose of Excel in 95% of places it is used.
Lately people have been calling this "syncophancy." This was always the problem. Sycophancy is the product.
Claude Excel is leaning deeply into this garbage.
It seems like to me the answer is moreso "People on HN are so far removed from the real use cases for this kind of automation they simply have no idea what they're talking about".
This is so correct it hurts
> How teams use Claude for Excel
Who are these teams that can get value from Anthropic? One MCP and my context window is used up and Claude tells me to start a new chat.
Seems everyone is speculating features instead of just reading TFA which does in fact list features:
- Get answers about any cell in seconds: Navigate complex models instantly. Ask Claude about specific formulas, entire worksheets, or calculation flows across tabs. Every explanation includes cell-level citations so you can verify the logic.
- Test scenarios without breaking formulas: Update assumptions across your entire model while preserving all dependencies. Test different scenarios quickly—Claude highlights every change with explanations for full transparency.
- Debug and fix errors: Trace #REF!, #VALUE!, and circular reference errors to their source in seconds. Claude explains what went wrong and how to fix it without disrupting the rest of your model.
- Build models or fill existing templates: Create draft financial models from scratch based on your requirements. Or populate existing templates with fresh data while maintaining all formulas and structure.
If this can reliably deal with the REF, VALUE, and NA problems, it'll be worth it for that alone.
Oh and deal with dates before 1900.
Excel is a gift from God if you stay in its lane. If you ever so slightly deviate, not even the Devil can help you.
But maybe, juuuuust maybe, AI can?
"not even Devil can help you.
But maybe, juuuuust maybe, AI can?"
Bold assumption that the devil and AI aren't aligned ;)
Also people complaining about AI inaccuracy are just technical people that like precision. The vast majority of the world is people who dont give a damn about accuracy or even correctness. They just want to appear as if not completely useless to people that could potentially affect their salary
They can try, but doubt anyone serious will adopt it.
Tried integrating chatgpt into my finance job to see how far I can get. Mega jikes...millions of dollars of hallucinated mistakes.
Worse you don't have the same tight feedback loop you've got in programming that'll tell you when something is wrong. Compile errors, unit tests etc. You basically need to walk through everything it did to figure out what's real and what's hallucinations. Basically fails silently. If they roll that out at scale in the financial system...interesting times ahead.
Still presumably there is something around spreadsheets it'll be able to do - the spreadsheet equivalent of boilerplate code whatever that may be
I'm bad with spread sheets so maybe this is trivial but having an llm tell me how to connect my sheet to whatever data I'm using at the moment and it coming up with a link or sql query or both has allowed me to quickly pull in data where I'd normally eyeball it and move on or worst case do it partially manually if really important.
It's like one off scripts in a sense? I'm not doing complex formulas I just need to know how I can pull data into a sheet and then I'll bucketize or graph it myself.
Again probably because I'm not the most adept user but it has definitely been a positive use case for me.
I suspect my use case is pretty boilerplatey :)
From the signup form mentioning Private Equity / Venture Capital, Hedge Fund, Investment Banking... this seems squarely aimed at financial modeling. Which is really, really cool.
I've worked alongside sell-side investment bankers in a prior startup, and so much of the work is in taking a messy set of statements from a company, understanding the underlying assumptions, and building, and rebuilding, and rebuilding, 3-statement models that not only adhere to standard conventions (perhaps best introed by https://www.wallstreetprep.com/knowledge/build-integrated-3-... ) but also are highly customized for different assumptions that can range from seasonality to sensitivity to creative deal structures.
It is quite common for people to pull many, many all-nighters to try to tweak these models in response to a senior banker or a client having an idea! And one might argue there are way too many similar-looking numbers to keep a human banker from "hallucinating," much less an LLM.
But fundamentally, a 3-statement model and all its build-sheets are a dependency graph with loosely connected human-readable labels, and that means you can write tools that let an LLM crawl that dependency graph in a reliable and semantically meaningful way. And that lets you build really cool things, really fast.
I'm of the opinion that giving small companies the ability to present their finances to investors, the same way Fortune 500 companies hire armies of bankers to do, is vital to a healthy economy, and to giving Main Street the best possible chance to succeed and grow. This is a massive step in the right direction.
Presenting your finances to investors via a tool designed for generation of plausible looking data is fraud.
Presenting false data to investors is fraud, doesn't matter how it was generated. In fact, humans are quite good at "generating plausible looking data", doesn't mean human generated spreadsheets are fraud.
On the other hand, presenting truthful data to investors is distinctly not fraud, and this again does not depend on the generation method.
If humans "generate plausible looking data" despite any processes to ensure data quality they've likely engaged in willful fraud.
An LLM doing so needn't even be willful from the author's part. We're going to see issues with forecasts/slide decks full of inaccuracies that are hard to review.
You might have accidentally described what accounting is.
Anthropic is in a weird place for me right now. They're growing fast , creating little projects that i'd love to try, but their customer service was so bad for me as a max subscriber that I set an ethical boundary for myself to avoid their services until such point that it appears that they care about their customers whatsoever.
I keep searching for a sign, but everyone I talk to has horror stories. It sucks as a technologist that just wants to play with the thing; oh well.
> I keep searching for a sign, but everyone I talk to has horror stories. It sucks as a technologist that just wants to play with the thing; oh well.
The reason that Claude Code doesn't have an IDE is because ~"we think the IDE will obsolete in a year, so it seemed like a waste of time to create one."
Noam Shazeer said on a Dwarkesh podcast that he stopped cleaning his garage, because a robot will be able to do it very soon.
If you are operating under the beliefs these folks have, then things like IDEs, cleaning up, and customer service are stupid annoyances that will become obsolete very soon.
To be clear, I have huge respect for everyone mentioned above, especially Noam.
"Noam Shazeer said on a Dwarkesh podcast that he stopped cleaning his garage, because a robot will be able to do it very soon".
How much is the robot going to cost in a year? 100k? 200k? Not mass market pricing for sure.
Meanwhile, today he could pay someone $1000 to clean his garage.
I would do it for free, just to answer the question of what does a genius of his caliber have in his garage? Probably the same stuff most people do, but it would still be interesting.
I don’t think the point was about having a clean space, it was in response to a question along the lines of: when do you think we will achieve AGI?
bad customer service comes from low priority. I think anthropic prioritize new growth point over small number of customer’s feedback, that’s why they publish new product, features so frequently, there are so much possible potential opportunities for them to focus
What happened? I'm a Max subscriber and I'd like to know what to look out for!
Best way to think of it is this: Right now you are not the customer. Investors are.
The money people pay in monthly fees to Anthropic for even the top Max sub likely doesn't come closer to covering the energy & infrastructure costs for running the system.
You can prove this to yourself by just trying to cost out what it takes to build the hardware capable of running a model of this size at this speed and running it locally. It's tens of thousands of dollars just to build the hardware, not even considering the energy bills.
So I imagine the goal right now is to pull in a mass audience and prove the model, to get people hooked, to get management and talent at software firms pushing these tools.
And I guess there's some in management and the investment community that thinks this will come with huge labour cost reductions but I think they may be dreaming.
... And then.. I guess... jack the price up? Or wait for Moore's Law?
So it's not a surprise to me they're not jumping to try and service individual subscribers who are paying probably a fraction of what it costs them to the run the service.
I dunno, I got sick of paying the price for Max and I now use the Claude Code tool but redirect it to DeepSeek's API and use their (inferior but still tolerable) model via API. It's probably 1/4 the cost for about 3/4 the product. It's actually amazing how much of the intelligence is built into the tool itself instead of just the model. It's often incredibly hard to tell the difference bertween DeepSeek output and what I got from Sonnet 4 or Sonnet 4.5
I've been playing around with local LLMs in Ollama, just for fun. I have an RTX 4080 Super, a Ryzen 5950X with 32 threads, and 64 GB of system memory. A very good computer, but decidedly consumer-level hardware.
I have primarily been using the 120b gpt-oss model. It's definitely worse than Claude and GPT-5, but not by, like, an order of magnitude or anything. It's also clearly better than ChatGPT was when it first came out. Text generates a bit slowly, but it's perfectly usable.
So it doesn't seem so unreasonable to me that costs could come down in a few years?
You are bang on.
Every AI company right now (except Google Meta and Microsoft) has their valuations based on the expectation of a future monopoly on AGI. None of their business models today or in the foreseeable horizon are even positive let alone world-dominating. The continued funding rounds are all apparently based on expectation of becoming the sole player.
The continuing advancement of open source / open weights models keeps me from being a believer.
I’ve placed my bet and feel secure where it is.
So cool, I hope they pull it off. So many people use Excel. Although, I always thought the power of AI in Excel would come from the ability to use AI _as_ a formula. For example, =PROMPT("Classify user feedback as positive, neutral or negative", A1). This would enable normal people (non-programmers) to fire off thousands of prompts at once and automate workflows like programmers do (disclaimer: I am the author of Cellm that does exactly this). Combined with Excel's built-in functions for deterministic work, Claude could really kill the whole copy-pasting data in and out of chat windows for bulk-processing data.
I'm a co-founder of Calcapp, an app builder for formula-driven apps using Excel-like formulas. I spent a couple of days using Claude Code to build 20 new templates for us, and I was blown away. It was able to one-shot most apps, generating competent, intricate apps from having looked at a sample JSON file I put together. I briefly told it about extensions we had made to Excel functions (including lambdas for FILTER, named sort type enums for XMATCH, etc), and it picked those up immediately.
At one point, it generated a verbose formula and mentioned, off-handedly, that it would have been prettier had Calcapp supported LET. "It does!", I replied, "and as an extension, you can use := instead of , to separate names and values!") and it promptly rewrote it using our extended syntax, producing a sleek formula.
These templates were for various verticals, like real estate, financial planning and retail, and I would have been hard-pressed to produce them without Claude's domain knowledge. And I did it in a weekend! Well, "we" did it in a weekend.
So this development doesn't really surprise me. I'm sure that Claude will be right at home in Excel, and I have already thought about how great it would be if Claude Code found a permanent home in our app designer. I'm concerned about the cost, though, so I'm holding off for now. But it does seem unfair that I get to use Claude to write apps with Calcapp, while our customers don't get that privilege.
(I wrote more about integrating Claude Code here: https://news.ycombinator.com/item?id=45662229)
On the first glance this seems to be a very bad idea. But re-readig this:
> Get answers about any cell in seconds: Navigate complex models instantly. Ask Claude about specific formulas, entire worksheets, or calculation flows across tabs. Every explanation includes cell-level citations so you can verify the logic.
this might just be an excellent tool for refactoring Excel sheets into something more robust and maintainable. And making a bunch of suits redundant.
This is going to be massive if it works as well as I suspect it might.
I think many software engineers overlook how many companies have huge (billion dollar) processes run through Excel.
It's much less about 'greenfield' new excel sheets and much more about fixing/improving existing ones. If it works as well as Claude Code works for code, then it will get pretty crazy adoption I suspect (unless Microsoft beats them to it).
> I think many software engineers overlook how many companies have huge (billion dollar) processes run through Excel.
So they can fire the two dudes that take care of it, lose 15 years of in house knowledge to save 200k a year and cry in a few months when their magic tool shits the bed ?
Massive win indeed
If the company is half baked, those "two dudes" will become indispensable beyond belief. They are the ones that understand how Excel works far deeper, and paired with Claude for Excel they become far far more valuable.
At my org it more that these AI tools finally allow the employees to get through things at all. The deadlines are getting met for the first time, maybe ever. We can at last get to the projects that will make the company money instead of chasing ghosts from 2021. The burn down charts are warm now.
> This is going to be massive if it works as well as I suspect it might.
Until Microsoft does its anti-competitive thing and find a way to break this in the file format, because this is exactly what copilot in excel does.
That said, Copilot in Excel is pretty much hot garbage still so anything will be better than that.
What do you mean, what is copilot in excel doing exactly?
The thing really missing from multi-megabyte excel sheets of business critical carnage was a non-deterministic rewrite tool. It'll interact excitingly with the industry standard of no automated testing whatsoever.
I 100% believe generative AI can change a spreadsheet. Turn the xslx into text, mutate that, turn it back into an xslx, throw it away if it didn't parse at all. The result will look pretty similar to the original too, since spreadsheets are great at showing immediately local context and nothing else.
Also, we've done a pretty good job of training people that chatgpt works great, so there's good reason for them to expect claude for excel to work great too.
I'd really like the results of this to be considered negligence with non-survivable fines for the reckless stupidity, but more likely, it'll be seen as an act of god. Like all the other broken shit in the IT world.
I wonder if this will be more/less useful than what we have with AI in software development.
There's a lot less to understand than a whole codebase.
I don't do spreadsheets very often, but I can emphasize with tracking down "Trace #REF!, #VALUE!, and circular reference errors to their source in seconds." I once hit something like that, and I found it a lot harder to trace a typical compiler error.
I'm not excited about having LLMs generate spreadsheets or formulas. But, I think LLMs could be particularly useful in helping me find inconsistent formulas or errors that are challenging to identify. Especially in larger, complex spreadsheets touched by multiple people over the course of months.
For once in my life, I actually had a delightful interaction with an LLM last week. I was changing some text in an Excel sheet in a very progromatic way that could have easily been done with the regex functions in Excel. But I'm not really great with regex, and it was only 15 or so cells, so I was content to just do it manually. After three or four cells, Copilot figured out what I was doing and suggested the rest of the changes for me.
This is what I want AI to do, not generate wrong answers and hallucinate girlfriends.
One approach is to produce read-only data in BI tools: users are free to export anything they want and make their own spreadsheets, but those are for their own use only. Reference data is produced every day by a central, controlled process and cannot in any circumstance be modified by the end user.
I have implemented this a couple of times and not only does it work well, it tends to be fairly well accepted. People need spreadsheets to work on them, but generally they kind of hate sending those around via email. Having a reference source of data is welcomed.
IMO, a real solution here has to be hybrid, not full LLM, because these sheets can be massive and have very complicated structures. You want to be able to use the LLM to identify / map column headers, while using non-LLM tool calling to run Excel operations like SUMIFs or VLOOKUPs. One of the most important traits in these systems is consistency with slight variation in file layout, as so much Excel work involves consolidating / reconciling between reports made on a quarterly basis or produced by a variety of sources, with different reporting structures.
Disclosure: My company builds ingestion pipelines for large multi-tab Excel files, PDFs, and CSVs.
That's exactly what they're doing.
https://www.anthropic.com/news/advancing-claude-for-financia...
"This won't work because (something obvious that engineers at Anthropic clearly thought of already)"
Not really. Take for example:
item, date, price
abc, 01/01/2023, $30
cde, 02/01/2023, $40
... 100k rows ...
subtotal. $1000
def, 03/01,2023, $20
"Hey Claude, what's the total from this file? > grep for headers > "Ah, I see column 3 is the price value" > SUM(C3:C) -> $2020 > "Great! I found your total!"
If you can find me an example of tech that can solve this at scale on large, diverse Excel formats, then I'll concede, but I haven't found something actually trustworthy for important data sets
So more or less like what AI has been doing for the last couple of years when it comes to writing code?
How is this different from the existing Claude skill, that uses a prompt and pandas to edit an Excel file?
https://github.com/anthropics/skills/blob/main/document-skil...
This isn't built for Excel users who use Github and Claude Skills, it's built for Excel users who would run away from Git commands.
The Claude skill I linked to is built into the Claude desktop client. You just attach an Excel file to your chat and ask away.
I linked to the skill prompt just to more clearly explain the approach that's currently available to all Claude users.
It requires zero familiarity with git or command line.
Fodasse a Rows é pelo menos 3x melhor
As I was reading through the post, and the comments here, and pondering my own many hours with these tools, I was suddenly reminded of one of my favorite studio C sketches: An Unfortunate Fortune
https://www.youtube.com/watch?v=SF-psoWdSpo
Curious, if others see the connection. :D
Tough day to be an AI Excel add-in startup
Ask Rosie is actually shutting down right now: https://www.askrosie.ai/
I would love to learn more about their challenges as I have been working on an Excel AI add-in for quite some time and have followed Ask Rosie from almost their start.
That they now gone through the whole cycle worries me I‘m too slow as a solo building on the side in these fast paced times.
its a great time for your ai excel add-in to start getting acquired by a claude competitor though
Not OpenAI, though, because they already gave $14M to an AI Excel add-in startup (Endex)
That seems to be true for any startup that offers a wrapper to existing AIs rather than an AI on their own. The lucky ones might be bought but many if not most of them will perish trying to compete with companies that actually create AI models and companies large enough to integrate their own wrappers.
Actually just wrote about this: https://aimode.substack.com/p/openai-is-below-above-and-arou...
not sure if it binary like that but as startups we will probably collect the scraps leftover indeed instead
Interesting their X post mentions "pre-built Agent Skills" but it's not on the webpage. I wonder if they will give you the ability to edit/add/delete Skills, that would be phenomenal.
Edit: found it on their other blog post https://www.anthropic.com/news/advancing-claude-for-financia...
You can add and customize skills in claude.ai and other surfaces
Dumb question, but is this Claude for Excel the.. app? The webapp? Does it work on Google sheets? etc
There are quite a few spreadsheet apps out there, just curious what their implementation is or how it's implemented to work with multiple apps.
I always find Excel (and the Office ecosystem) confusing heh.
Modern Excel add-ins work in desktop Windows, macOS, and web. They're just a bit of XML that Excel looks at to call a whatever web endpoint is defined in the XML.
I guess Claude maybe useful for finding errors in large Excel Workbooks. May also help beginners to learn the more complex Excel functions (which are still pretty easy). But if you are proficient at building Excel models I don't see any benefit. Excel already has a superb very efficient UI for entering formulas, ranges, tables, data sources etc I'm sceptical that a different UI especially a text based one can improve on this.
I understand the sentiment about a skilled user not needing this, but I think having a little buddy that I can use to offload some menial tasks would be helpful for me to iterate through my models more efficiently; even if the AI is not perfect. As a highly skilled excel user, I admit the software has terrible ergonomics. It would be a productivity boon for me if an AI can help me stay focused on model design vs model implementation.
For some reason, I find that these tools are TERRIBLE at helping someone learn. I suspect because turning one on, results in turning the problem solving part of ones brain off.
Its obviously not the same experience for everyone. ( If you are one of those energized while working in a chat window, you might be in a minority - given what we see from the ongoing massacre of brains in education. )
Paraphrasing something I read here "people don't use ChatGPT to do learn more, they use it to study less".
Maybe some folk would be better off.
Hope it’s better than what MS is currently shipping as AI. Everything I try to do something, the response is “sorry, I can’t do this”.
Copilot is getting better - I'm getting fewer of those than I used to - but it's still significantly more stupid than other agents, even when in theory it's using the same model.
On the one hand, most financial companies have a lot of processes in Excel that could be made better with something like Claude.
Banking secrecy laws + customer identifying data + AI tool = No bueno.
I use excel but not for financial modelling, I’ll use it
Gemini already has its hooks in Google Sheets, and to be honest, I've found it very helpful in constructing semi-complicated Excel formulas.
Being able to select a few rows and then use plain language to describe what I want done is a time saver, even though I could probably muddle through the formulas if I needed to.
Last time I tried using Gemini in Google Sheets it hallucinated a bunch of fake data, then gave me a summary that included all that fake data. I'd given it a bunch of transaction data, and asked it to group the records into different categories for budgeting. When asking it to give the largest values in each category, all the values that came back were fake. I'm not sure I'd really trust it to touch a spreadsheet after that.
you should:
-stop using the free plan -don't use gemini flash for these tasks -learn how to do things over time and know that all ai models have improved significantly every few months
Or not use it.
I would recommend trying TabTabTab at https://tabtabtab.ai/
It is an entire agent loop. You can ask it to build a multi sheet analysis of your favorite stock and it will. We are seeing a lot of early adopters use it for financial modeling, research automation, and internal reporting tasks that used to take hours.
I forgot to add, you can try TabTabTab, without installing anything as well.
To see something much more powerful on Google Sheets than Gemini for free, you can add "try@tabtabtab.ai" to your sheet, and make a comment tagging "try@tabtabtab.ai" and see it in action.
If that is too much just go to ttt.new!
I have had the opposite experience. I've never had Gemini give me something useful in sheets, and I'm not asking for complicated things. Like "group this data by day" or "give me p50 and p90"
Gemini integratoins to Google workspace feels like it's using Gemini 1.5 flash, it's so comically bad at understanding and generating
I have just launched a product (easyanalytica.com) to create dashboards from spreadsheets, and Excel is on my to-do list of formats to be supported. However, I'm having second thoughts. Although, from the description, it seems like it would be more helpful on the modeling side rather than the presentation side. I guess I'll have to wait until it's publicly available
Why second thoughts?
everyone will use claude if they support it why would they use my product. so i will have to find some other angle to differentiate.
This could be invaluable for reverse engineering complex workbooks with multiple data sources and hundreds or thousands of formulas.
If it has a concept of data sources and can digest them, sure. Anecdotally, most issues with Excel at my job are caused by data sources being renamed, moved or reformatted, by broken logins, or by insufficient access rights.
George Hotz said there's 5 tiers of AI systems, Tier 1 - Data centers, Tier 2 - fabs, Tier 3 - chip makers, Tier 4 - frontier labs, Tier 5 - Model wrappers. He said Tier 4 is going to eat all the value of Tier 5, and that Tier 5 is worthless. It's looking like that's going to be the case
That is a common refrain by people who have no domain expertise in anything outside of tech.
Spend a few years in an insurance company, a manufacturing plant, or a hospital, and then the assertion that the frontier labs will figure it out appears patently absurd. (After all, it takes humans years to understand just a part of these institutions, and they have good-functioning memory.)
This belief that tier 5 is useless is itself a tell of a vulnerability: the LLMs are advancing fastest in domain-expertise-free generalized technical knowledge; if you have no domain expertise outside of tech, you are most vulnerable to their march of capability, and it is those with domain expertise who will rely increasingly less on those who have nothing to offer but generalized technical knowledge.
yeah but if Anthropic/OpenAI dedicate resources to gaining domain expertise then any tier 5 is dead in the water. For example, they recently hired a bunch of finance professionals to make specialized models for financial modeling. Any startup in that space will be wiped out
I dont think the claim is exactly that tier 5 is useless more that tier 5 synergizes so well with tier 4 that all the popular tier 5 products will eventually be made by the tier 4 companies.
Andrew Ng argumented in 2023 (https://www.youtube.com/watch?v=5p248yoa3oE ) that the underlying tiers depend on the app tier‘s success.
That OpenAI is now apparantly striving to become the next big app layer company could hint at George Hotz being right but only if the bets work out. I‘m glad that there is competition on the frontier labs tier.
Tier 5 requires domain expertise until we reach AGI or something very different from the latest LLMs.
I don’t think the frontier labs have the bandwidth or domain knowledge (or dare I say skills) to do tier 5 tasks well. Even their chat UIs leave a lot to be desired and that should be their core competency.
George Hotz says a lot of things. I think he's directionally correct but you could apply this argument to tech as a whole. Even outside of AI, there are plenty of niches where domain-specific solutions matter quite a bit but are too small for the big players to focus on.
People were saying the same thing about AWS vs SaaS ("AWS wrappers") a decade ago and none of that came to pass. Same will be true here.
Claude is a model wrapper, no?
Anthropic is a frontier lab, and Claude is a frontier model
Interesting. I found a reference to this in a tweet [1], and it looks to be a podcast. While I'm not extremely knowledgable. I'd put it like this: Tier 1 - fabs, Tier 2 - chip makers, Tier 3 - data centers, Tier 4 - frontier labs, Tier 5 - Model wrappers
However I would think more of elite data centers rather than commodity data centers. That's because I see Tier 4 being deeply involved in their data centers and thinking of buying the chips to feed their data centers. I wouldn't be so inclined to throw in my opinion immediately if I found an article showing this ordering of the tiers, but being a tweet of a podcast it might have just been a rough draft.
1: https://x.com/tbpn/status/1935072881425400016
Been working with Claude Code lately and been pretty impressed. If this works as well could be a nice add on. Its probably a smart market to enter as Excel is essentially everywhere.
Just like Claude Code allows 1 dev to potentially do the work of 2 or 3, I could see this allowing 1 accountant or operations person to do the work of 2 or 3. Financial savings but human cost
It’s interesting to me that this page talks a lot about “debugging models” etc. I would’ve expected (from the title) this to be going after the average excel user, similar to how chatgpt went after every day people.
I would’ve expected “make a vlookup or pivot table that tells me x” or “make this data look good for a slide deck” to be easier problems to solve.
The issue is that the average Excel user doesn’t quite have the skills to validate and double-check the Excel formulas that Claude would produce, and to correct them if needed. It would be similar to a non-programmer vibe-coding an app. And that’s really not what you want to happen for professionally used Excel sheets.
IMO that is exactly what people want. At my work everyone uses LLMs constantly and the trade off of not perfect information is known. People double check it, etc, but the information search is so much faster even if it finds the right confluence but misquotes it, it still sends me the link.
For easy spreadsheet stuff (which 80% of average white collars workers are doing when using excel) I’d imagine the same approach. Try to do what I want, and even if you’re half wrong the good 50% is still worth it and a better starting point.
Vibe coding an app is like vibe coding a “model in excel”. Sure you could try, but most people just need to vibe code a pivot table
I think actually Anthropic themselves are having trouble with imagining how this could be used. Coders think like coders - they are imagining the primary use case being managing large Excel sheets that are like big programs. In reality most Excel worksheets are more like tiny, one-off programs. More like scripts than applications. AI is very very good at scripts.
I think this is aiming to be Claude Code for people who use Excel as a programming environment.
Yeah now tell the Auditors that the financial spreadsheet we have here has AI touching it left and right. "I did not cook the books I promise it is the AI that made our financials seem better than they actually are trust me bro!", said Joe from Accounting.
R.I.P. global economy
I'm excited to see what national disasters will be caused by auto-generated Excel sheets that nobody on the planet understands. A few selections from past HN threads to prime your imagination:
Thousands of unreported COVID cases: https://news.ycombinator.com/item?id=24689247
Thousands of errors in genetics research papers: https://news.ycombinator.com/item?id=41540950
Wrong winner announced in national election: https://news.ycombinator.com/item?id=36197280
Countries across the world implement counter-productive economic austerity programs: https://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt#Metho...
Especially combined with the dynamic array formulas that have recently been added (LET, LAMBDA etc). You can have much more going on within each cell now. Think whole temporary data structures. The "evaluate formula" dialog doesn't quite cut it anymore for debugging.
from my experience in the corporate world, i'd trust an excel generated / checked by an LLM more than i would one that has been organically grown over years in a big corporation where nobody ever checks or even can check anything because its one big growing pile of technical debt people just accept as working
Ok, they weren't confident enough to let the model actually edit the spreadsheet. Phew..
Only a matter of time before someone does it though.
How well does change tracking work in Excel... how hard would it be to review LLM changes?
AFAIK there is no 'git for Excel to diff and undo', especially not built-in (aka 'for free' both cost-wise and add-ons/macros not allowed security-wise).
My limited experience has been that it is difficult to keep LLMs from changing random things besides what they're asked to change, which could cause big problems if unattackable in Excel.
I thought there was track changes on all office products. Most Office documents are zip files of XML files and assets, so I'd imagine it would be possible to rollback changes.
When I think how easy I can misclick to stuff up a spreadsheet I can't begin to imagine all the subtle ways LLMs will screw them up.
Unlike code where it's all on display, with all these formulas are hidden in each cell, you won't see the problem unless click on the cell so you'll have a hard time finding the cause.
I wish Gemini could edit more in Google sheets and docs.
Little stuff like splitting text more intelligently or following the formatting seen elsewhere would be very satisfying.
Checkmate, Altman
As an inveterate Excel lover, I can just sense the blinding pain wafting off the legions of accountants, associates, seniors, and tech people who keep the machine spirits placated.
lies, damn lies, statistics, and then Excel deciding cell data types.
If this works very well and reliable, it might not kill programming as such, but it might put a lot of small businesses who do custom software for other small businesses out of work.
The HN bubble might not realize the implications.
If AI turns out to be the powerhouse it is claimed to be, AI's impact will be corporations replacing corporate dependencies upon 'Excel projects' created by self-taught assistants to department managers.
I just want Claude inside of Metabase.
https://www.metabase.com/features/metabot-ai
Cool but now companies POs will be like "you must add the Excel export for all the user data!" and when asked why, will basically be "so I can do this roundabout query of data for some number in a spreadsheet using AI (instead of just putting the number or chart directly in the product with a simple db call)"
This could be huge! Very exciting!
Yet more evidence of the bubble burst being imminent. If any of these companies really had some almost-AGI system internally, they wouldn’t be spending any effort making f’ing Excel plugins. Or at the very least, they’d be writing their own Excel because AI is so amazing at coding, right?
Excel is living business knowlege stuck in private SharePoint Sites, tappimg into it might kick off a nice data flywheel not to speak of the nice TAM.
You make a great point. Where is all of the complex applications? They haven't been able to create than own office suite or word processor or really anything aside from a halloween matching game in js. You would think we would have some complex application they can point to but nothing.
The current valuations do not require AGI. They require products like this that will replace scores of people doing computer based grunt work. MSFT is worth $4 trillion off the back of enterprise productivity software, the AI labs just need some of that money.
The fine tuning will continue until we reach AGI.
The fine tuning will continue until we reach the torment nexus, at best
You wouldn't believe the amount of shit that runs on Excel.
Yes. I once interviewed a developer who’s previous job was maintaining the .NET application that used an Excel sheet as the brain for decisions about where to drill for oil on the sea floor. No one understood what was in the Excel sheet. It was built by a geologist who was long gone. The engineering team understood the inputs and outputs. That’s all they needed to know.
Years ago when I worked for an engineering consulting company we had to work with a similarly complex, opaque Excel spreadsheet from General Electric modeling the operation of a nuclear power plant in exacting detail.
Same deal there -- the original author was a genius and was the only person who knew how it was set up or how it worked.
I think you’re misunderstanding me. This might be something somewhat useful, I don’t know, and I’m not judging it based on that.
What I’m saying is that if you really believed we were 2, maybe 3 years tops from AGI or the singularity or whatever you would spend 0 effort serving what already seems to be a domain that is already served by 3rd parties that are already using your models! An excel wrapper for an LLM isn’t exactly cutting edge AI research.
They’re desperate to find something that someone will pay a meaningful amount of money for that even remotely justifies their valuation and continued investment.
I spotted a custom dialog in an Excel spreadsheet in a medical context the other day, I was horrified.
Sic
This. I work in Pharma. Excel and faxes.
A program that can do excel for you is almost AGI
[flagged]
I’ve got some bad news about the prospects of your startup
Learn > Documentation is just a single markdown doc?
Fresh account spam on my HN? Buy an ad somewhere.
Aaaaand it's gone!
[flagged]
"Eschew flamebait. Avoid generic tangents."
https://news.ycombinator.com/newsguidelines.html
Okay. But then you could say the same for a human, isn't your brain just a cloud of matter and electricity that just reacts to senses deterministically?
> isn't your brain just a cloud of matter and electricity that just reacts to senses deterministically?
LLMs are not deterministic.
I'd argue over the short term humans are more deterministic. I ask a human the same question multiple times and I get the same answer. I ask an LLM and each answer could be very different depending on its "temperature".
If you ask human the same question repeatedly, you'll get different answers. I think that at third you'll get "I already answered that" etc.
We hardly react to things deterministically.
But I agree with the sentiment. It seems it is more important than ever to agree on what it means to understand something.
I'm having a bad day today. I'm 100% certain that today I'll react completely different to any tiny issue compared to how I did yesterday.
OK then. Groks?
I mean - try clicking the CoPilot button and see what it can actually do. Last I checked, it told me it couldn't change any of the actual data itself, but it could give you suggestions. Low bar for excellence here.