Microsoft Favors Anthropic over OpenAI for Visual Studio Code

174 points by corvad 7 hours ago

kerpal 6 hours ago

Claude/Anthropic is more focused on productivity (Coding, Spreadsheets, Reports). ChatGPT seems more focused on general-purpose LLM (Research, Cooking, Writing, Image Generation).

Makes sense that MS would partner with Anthropic since their tool-use for productivity (Claude Code) seems superior. I personally rarely code with ChatGPT, almost strictly Claude.

dmurray 5 hours ago

Some people might be surprised that MS would pick the product with the best technological fit rather than the one they already have a deep business and financial relationship with.
Surely Microsoft's expertise these days is in cross-selling passable but not best-in-class products to enterprises who already pay for Microsoft products.
It says something about how they view the AI coding market, or perhaps the level of the gap between Anthropic and OpenAI here, that they've gone the other way.
- dijit 4 hours ago
  
  They are right to be surprised.
  Why is Azure popular? Not on its own merits, it's because there is a pre-existing relationship with Microsoft.
  Why is Teams the most widely used chat tool? Certainly not because it's good.. it is, again, pre-existing business relationships.
  Seems odd for a company that survives (perhaps even thrives) on these kinds of intertwined business reasons to, themselves, understand that they should go for merit instead.
  - mikestorrent 4 hours ago
    
    Yep. Similarly, Microsoft Entra... if you want Office, you're getting it anyway. Might as well use it for SSO, right? And here's your free Teams license... how can you justify paying for Slack when we've a perfectly good chat client at home?
    
    dijit 3 hours ago
    
    I tried for a while to get Entra working with an external identity provider (Google Workspace).
    The other way around worked (Google could use Entra) but it was basically impossible to backend Entra from Google. Weird.
  - vehemenz an hour ago
    
    Except nobody chooses M365 Copilot over ChatGPT or Claude, so clearly the usual reasons aren't working. In this case, improving the product via integration is a last resort.
- thewebguyd 3 hours ago
  
  > It says something about how they view the AI coding market
  I think Microsoft views models as a commodity and they'd rather lean into their strengths as a tool maker, so this is Microsoft putting themselves into a position to make tools around/for any AI/LLM model, not just ones they have a partnership with.
  Honestly I think this sort of agnosticism around AI will work out well for them.
pnathan 5 hours ago

I've been happy with Anthropic models. I also have been using the Google models more, with decent results. The Copilot/OpenAI models don't seem to be as good as a rule of thumb, can't explain exactly why.
Overall, I think Google has a better breadth of knowledge encoded, but Anthropic gets work done better.
mnky9800n 4 hours ago

I like perplexity's deep research model which is based on deepseek i think. i use that for most kind of writing, discussion, research, etc. where I need some kind of feedback. Claude seems to go crazy sometimes when you ask it to do the same task. Whereas for coding, Claude Code is obviously better than everything else under the sun.
- SparkyMcUnicorn 18 minutes ago
  
  I decided to give perplexity another try a few days ago, and it still seems to hallucinate things. Given the same exact tasks/prompts both Claude and Chatgpt got the facts correct.
_fat_santa 5 hours ago

This has been largely my experience as well. Claude does way better with coding while ChatGPT does better with general questions.
- bobbylarrybobby 4 hours ago
  
  The new gpt-codex-* models are giving Claude Code a serious run for its money IMO. If OpenAI can figure out the Codex CLI UI (better permissions, more back and forth before executing) then I think they will have the better agentic coder.
  - kerpal 4 hours ago
    
    Tried codex but so far feels a lot slower than claude code. Perhaps because I'm on the basic plan?
airstrike 4 hours ago

Are there any open models that compete with Claude in its tool use capabilities for complex tasks?
Feels like an area where we could use more competition...
m_mueller 6 hours ago

GPT-5 is pretty decent nowadays, but Claude 4 Sonnet is superior in most cases. GPT beats it in cost and usable context window when something quite complex comes up to plan top-down.
- CharlieIsAHero 5 hours ago
  
  What do you mean by usable context window? Sonnet 4 is 968k and gpt5 is 368k. Are you saying the context window on sonnet is useless?
  - CuriouslyC 5 hours ago
    
    Sonnet long context performance sucks. https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/o...
    I can confirm Sonnet is good for vibe coding but makes an absolute mess of large and complex codebases, while GPT5 tends to be pretty respectful.
  - m_mueller 5 hours ago
    
    I never implied it's useless. I don't have scientific data to back this up either, this is just my personal "feeling" from a couple hundred hours I've spent working with these models this year: GPT-5 seems a bit better at top-down architectural work, while Sonnet is better at the detail coding level. In terms of usable context window, again from personal experience so far, to me GPT-5 has somewhat of an edge.
    
    613style 5 hours ago
    
    Agreed. My experience is GPT5 is significantly better at large-scale planning & architecture (at least for the kind of stuff I care about which is strongly typed functional systems), and then Sonnet is much better at executing the plan. GPT5 is also better at code reviews and finding subtle mistakes if you prompt it well enough, but not totally reliable. Claude Code fills its context window and re-compacts often enough that I have to plan around it, so I'm surprised it's larger than GPT's.
- boredtofears 5 hours ago
  
  What I find interesting is how much opinions vary on this. Open a different thread and people will seem to have consensus on GPT or Gemini being superior.
  Even the bench marks don’t seem all that helpful.
  - TuxSH 4 hours ago
    
    Well, last I checked Claude's webchat UI doesn't have LaTeX rendering for output messages which is extremely annoying.
    On the other hand, I wish ChatGPT had GitHub integration in Projects, not just in Codex.
    I've also had Claude Sonnet 4.0 Thinking spew forth incorrect answers many times for complex problems involving some math (with incapability to write a former proof sometimes), whereas ChatGPT 5 Thinking gives me correct answers with formal proof.
  - kissgyorgy 5 hours ago
    
    I think it depends on the domain. For example, GPT-5 is better for frontend, React code, but struggles with niche things like Nix. Claude's UI designs are not as pretty as GPT-5's.
    
    omneity 5 hours ago
    
    This is also pretty subjective. I’m a power user of both and tend to prefer Claude’s UI about 70-80% of the time.
    I often would use Claude to do a “make it pretty” pass after implementation with GPT-5. I find Claude’s spatial and visual understanding when dealing with frontend to be better.
    I am sure others will have the exact opposite experience.
    
    boredtofears 4 hours ago
    
    This is what I mean - even opinions on domain are wildly different. I've seen people say Claude's React is best.

glimshe 6 hours ago

Anthropic doesn't allow me to use my phone number across my personal and business logins. I simply can't use Claude where I need it, even if I'm willing to pay. I don't understand why they add so much friction when everyone else just allows me to do work.

electric_muse 5 hours ago

This whole “real phone number is your access code to every service” trend is really frustrating.
I had the same experience recently with: - Ticketmaster - Docusign - Vercel
Probably a handful more I forgot.
I believe the main reason is because it prevents fraud.
But I see a deeper motive that phone numbers are more friction to change and therefore our “real” numbers become hard-to-change identity codes that can easily be used to pull tons of info about you.
You give them that number and they immediately can look up your name, addresses, age, and tons of other mined info that was connected to you. Probably credit score, household income, etc.
Phone numbers have tons of “metadata” you provide without really knowing it. Like how the Exif data in a photo may reveal a lot about your location and device.
- derekdahmer 5 hours ago
  
  As someone who implemented phone verification at a company I worked for, it’s 100% for preventing spam signups intending to abuse free tiers. API companies can get huge volumes of fake signups from “multiplexers” who get around free tier limits by spreading their requests across multiple accounts.
  - AlexandrB 4 hours ago
    
    This makes sense for free tiers of products, but if you provide CC info for a paid tier, you shouldn't also have to provide a phone number. One or the other.
    
    moduspol 4 hours ago
    
    I think people can use stolen / one-time use / prepaid / limited purchase size credit cards fairly easily, too. And you might not find out until after they've racked up a non-trivial amount of costs.
    
    xur17 3 hours ago
    
    Then accept stablecoins.
    
    derekdahmer 3 hours ago
    
    Theoretically yes but a few issues:
    - Account creation usually happens before plan selection & payment. Most users start at free, then add a CC later either during on-boarding or after finishing their trial.
    - Virtual credit cards are very easy to create. You can signup with credit card with a very low limit and just use the free tier tokens.
  - jiveturkey 2 hours ago
    
    I would caution any reader to generalize your statement. Just because you used it at your company to limit abuse, and yes that is a lazy approach and 100% what's going on with Anthropic and most API companies, doesn't mean that every company uses phone number gating for this purpose.
    The (probably) most famous example being https://www.eff.org/deeplinks/2019/07/fixed-ftc-orders-faceb...
    And it's not enough to say "well we don't use it for that". One, you can't prove it. And two, far more important, in an information leak, by taking and saving the phone number (necessarily, otherwise there's no account gating feature unless you're just giving fake friction), you expose the user to risk of connecting another dot. I would never give my phone number to some rinky dink company.
    Now that said, I don't use lazy pejoratively. Products must launch.
  - anonym29 4 hours ago
    
    Because SMS verification is so cheap (under a dollar per one-time validation, under $10/mo for ongoing validation), this approach really only makes sense for ultra-low-value services, where e.g. $0.50 per account costs more than the service itself is worth.
    Because of this low value dynamic, there are many techniques that can be used to add "cost" to abusive users while being much less infringing upon user privacy: rate limiting, behavioral analysis, proof-of-work systems, IP restrictions, etc.
    Using privacy-invasive methods to solve problems that could be easily addressed through simple privacy-respecting technical controls suggests unstated ulterior motives around data collection.
    If your service is worth less than $0.50 per account, why are you collecting such invasive data for something so trivial?
    If your service is worth more than $0.50 per account, SMS verification won't stop motivated abusers, so you're using the wrong tool.
    If Reddit, Wikipedia, and early Twitter could handle abuse without phone numbers, why can't you?
    
    derekdahmer 3 hours ago
    
    Firstly, I can tell you phone number verification made a very meaningful impact. The cost of abuse can be quite high for services with high marginal costs like AI.
    Second, all those alternatives you described are also not great for user privacy either. One way or another you have to try to associate requests with an individual entity. Each has its own limitations and downsides, so typically multiple methods are used for different scenarios with the hope that all together its enough of a deterrence.
    Having to do abuse prevention is not great for UX and hurts legitimate conversion, I promise you most companies only do it when they reach a point where abuse has become a real problem and sometimes well after.
    
    anonym29 2 hours ago
    
    >Firstly, I can tell you phone number verification made a very meaningful impact. The cost of abuse can be quite high for services with high marginal costs like AI.
    Nobody has made the argument that it's not a deterrent at all. The core argument is that it's privacy-infringing when it doesn't need to be, and the cost posed to attackers is extremely low. If your business is offering a service at a price below your business' own costs, the business itself is choosing to inflict cost asymmetry upon itself.
    >Second, all those alternatives you described are also not great for user privacy either.
    This is plainly and obviously false at face value. How would blocklisting datacenter IP's, or doing IP-based rate limiting, or a PoW challenge like Anubis be "also not great" for user privacy, particularly when compared to divulging a phone number? Phone numbers are linked to far more commercially available PII than an IP address by itself is, and PoW challenges don't even require you to log IP addresses. Behavioral analysis like blocking more than N sign-ups per minute from IP address X, or blocking headless UA's like curl, or even blocking registrations using email addresses from known temp-mail providers is nowhere remotely close to being as privacy-infringing as requiring phone numbers is.
    The privacy difference between your stated practice and my proposed alternatives isn't a difference of degree; it's a fundamental difference of kind.
    Being generous, this is lazy, corner-cutting engineering that seeks to impose an unknown amount of privacy risk from the perspective of end users by piggybacking off an existing channel that only good-faith users won't forge (phone number), at the possible expense of good-faith users' privacy, rather than implementing a better control.
    Of course, there's no reason to be generous to for-profit corporations - the much more plausible explanation is that your business is data mining your own customers via this PII-linked registration requirement through a coercive ToS that refuses service unless customers provide this information, which is both entirely unnecessary for legitimate users and entirely insufficient to block even a slightly motivated abusive user.
    ...not that you'd ever admit to that practice if you were aware of it happening, or would even necessarily be aware of it happening if you were not a director or officer of the business.
- anonym29 4 hours ago
  
  Mandatory phone number registration does not and never has prevented fraud.
  Plenty of free VOIP services exist, including SMS reception.
  Even when the free service providers are manually blocklisted, one-time validations can be defeated with private numbers on real networks / providers for under a dollar per validation, and repeated ongoing validations can be performed with rented private numbers on real networks / providers for under ten dollars per month.
  The rent-an-SMS services that enable this are accessible through a web interface that allows connections from tor, vpns, etc - there is no guarantee that the telecom provider's location records of the IMEI tied to that phone number is anywhere close to the end user's real geographic location, so this isn't even helpful for law enforcement purposes where they can subpoena telecom provider records.
  This "phone number required" practice exists for one primary reason: for businesses to track non-fraudulent users, data mine their non-fraudulent users, and monetize the correlated personal information of non-fraudulent users without true informed consent (almost nobody reads ToS's, but many would object to these invasive practices if given a dialogue box that let them accept or decline the privacy infringements but still allowed the user to use the business' service either way).
  Sometimes, they are also used for a secondary reason: to allow the business to cheap out on developer costs by cutting corners on proper, secure MFA validation. No need to implement modern, secure passkeys or RFC-compliant TOTP MFA, FIDO2, U2F when you can just put your users in harm's way by pretending that SMS is a secure channel, rather than easily compromised by even common criminals with SS7 attacks, which are not relegated to nation-state actors like they once were.
  - slipnslider 4 hours ago
    
    >never has prevented fraud.
    Interesting, I've heard otherwise but it was anecdotes. Do you have any data on that?
    > to track non-fraudulent users
    You listed a large number of ways to fake the phone number which is why you believe it doesn't prevent fraud. What is to stop a non-fraudulent user from doing the same thing to prevent the tracking by the company?
    
    anonym29 2 hours ago
    
    >Do you have any data on that?
    The original stated intention of the practice was that "it" [mandatory phone number registration] "prevents fraud" (though this stance was being critiqued by the person who raised it, not defended).
    I'll concede that it probably has stymied some of the most trivial, incompetent fraud attempts made, and possibly reduced a negligible amount of actual fraud, but the idea that it can "prevent" fraud (implying true deterministic blocking, rather than delaying or frustrating) is refutable by the very reasonable assumption that there is almost certainly no company that implements mandatory phone number registration that has or will experience ZERO losses to fraud.
    That said, in fairness, this is an unfalsifiable and unverifiable claim, as to my knowledge, there is nothing resembling a public directory of fraud losses experienced by businesses, and there is no incentive for businesses to admit to fraud losses publicly (they may have tax incentives to report it to the IRS, legal incentives to report it to law enforcement, and publicly traded companies may have regulatory incentives to at least indirectly acknowledge operating losses incurred due to fraud in financial reporting), but that doesn't make the claim itself unreasonable or improbable.
    >What is to stop a non-fraudulent user from doing the same thing to prevent the tracking by the company?
    The argument isn't that mandatory phone registration unavoidably forces privacy infringement upon all users, just that it does infringe upon the privacy of some (I'd suggest a vast majority) of users in practice.
giancarlostoro 5 hours ago

Sounds like you want a Google Voice Number or similar service, but now you're spending money for someone else's awful software, and in some cases, some places will flag your number if its google voice and outright refuse to let you in.
- rs186 5 hours ago
  
  ...Like Claude. They don't allow you to use Google voice numbers for verification.
  - giancarlostoro 5 hours ago
    
    I want a "burner" number, but I'm not sure what the best option is, do I buy a crappy phone at Walmart and use that number? What's the bare bottom of the barrel cost for a phone with no mobile data, only SMS?
    
    kayodelycaon 2 hours ago
    
    I have an Ultra Mobile eSIM as a second line on my iPhone. Costs $3/mo and you can buy cards to top it up or add a credit card. My primary eSIM is a regular phone plan.
    It works surprisingly well. I can easily turn off the second line in Settings without removing it.
    I could’ve bought an unlocked phone with cash somewhere and used the SIM in that. They wouldn’t know who I was.
    I didn’t do that because it was inconvenient and it wouldn’t be anonymous once I started using it for SMS authentication.
    
    Yeroc an hour ago
    
    Any VOIP provider with SMS support should do the job. I personally use voip.ms but there are many.
    
    typpilol 3 hours ago
    
    TracFone with minutes at Walmart
    If you load it with 10 dollars in minutes, it'll be good forever. But I'm not sure what TracFone has for an inactive policy
raldi 6 hours ago

When was the last time you tried?
- glimshe 5 hours ago
  
  A month or so ago.
jiveturkey 2 hours ago

get the cheapest possible business phone#. another line on your existing service is generally very cheap, and you can just cancel it after registration.
dathinab 5 hours ago

wait what do they need a phone number for???
- criddell 3 hours ago
  
  I think they do that to make it more difficult for one person to open multiple accounts. You can still do it, you just need another phone.
catlover76 5 hours ago

[dead]

cwyers 4 hours ago

This is Microsoft subsidizing Claude inference costs -- if you look at how they charge models against your allotment, Gemini, GPT-5 and Claude 4 Sonnet all cost the same, despite Claude 4 Sonnet being more expensive than the other two. Not really sure I understand the economics here, especially since there's not really a clear winner between GPT-5 and Claude 4 Sonnet for coding (if anything I think GPT-5 puts up a better showing).

adonese 2 hours ago

I think copilot is very aggressive on tokens and context size. That is how I guess the economy works for them.
martinald 3 hours ago

"Sonnet being more expensive than the other two" -> you mean based on public pricing? Microsoft will not be paying retail prices for this.
- cwyers 3 hours ago
  
  I'm sure they don't pay retail on GPT-5 either.
mattalex 4 hours ago

It might be that they pay less for anthropic depending how many tokens are generated by each model: total cost is token cost times number of tokens. I haven't checked gpt5, but it is not impossible that price wise they might be very comparable if you account for reasoning tokens used.
- poslathian 2 hours ago
  
  Is it possible that regardless of what they pay they think Anthropic is negative margin on it?
drewbitt 3 hours ago

> I think GPT-5 puts up a better showing
Would the more casual Copilot audience be OK with gpt-5-high - the model that many say is better than Sonnet - taking significantly longer to respond? Potentially minutes longer. A faster model can make sense as a default
PhantomHour 2 hours ago

> Not really sure I understand the economics here
There is nothing to understand. The point of such subsidies is to turn OPEX into a green line on the stock market.
Especially as Microsoft is currently also in a fight with OpenAI.

paxys 6 hours ago

Claude was the gold standard for coding but I have had a lot of success with GPT-5. Nowadays I pretty much always default to GPT-5.

jbm 3 hours ago

Yeah, likewise. Claude has been going downhill recently for me, while Codex works great. I nearly cancelled my ChatGPT membership until they started providing codex, and now I'm considering if I want to use Pro again.
It's weird though because ChatGPT itself is not particularly better than it was before. Bringing down costs per token probably means they can do more reasoning before coding than Codex does.
Alifatisk 2 hours ago

Yup, GPT-5 took the throne of Sonnet
bwat49 6 hours ago

yeah I've been getting better results with codex (gpt5) vs claude

YmiYugy 4 hours ago

Personally I found gpt-5 to be a bit better than sonnet-4. At least in cursor. Claude is still more reliable and competent at tool calling, but I found gpt-5 to be better in token efficiency and a lot better at instruction following.

kranke155 4 hours ago

This seems to change every 6 months? Just my impression.
- compacct27 3 hours ago
  
  Or less!

verdverm 6 hours ago

Anthropic didn't make the cut in our evaluation (data usage concerns). They have also been the shadiest of the companies

They lost me when they expired my money and then tried to take more without asking

piker 6 hours ago

In some ways it makes sense to pave the way for Claude to protect the brand of VS Code. On the other hand, it’s a bit of a head-scratcher since it seems like VS code was built as a loss-leader to sell Microsoft cloud products. Perhaps enterprise ChatGPT, co-pilot and GitHub can make up the difference even if the community tier favors Claude.

Edit: maybe Cursor forced this and Microsoft is taking its choice to open license VS code on the chin. Will be interesting to see the strategy with Visual Studio going forward.

dawnerd 5 hours ago

Auto feels like a way for them to slightly push people towards paid models more. If they really favored Anthropic, claude would be the included free model, right?

mynameisvlad 4 hours ago

On their roadmap (it's in the linked blog post https://code.visualstudio.com/blogs/2025/09/15/autoModelSele...):
> Let users on a free plan take advantage of the latest models through auto
It also describes how the auto selector works in more detail:
> When using auto model selection, VS Code uses a variable model multiplier based on the automatically selected model. If you are a paid user, auto applies a 10% request discount. For example, if auto selects Sonnet 4, it will be counted as 0.9x of a premium request; if auto selects GPT-5-mini, this counts as 0x because the model is included for paid users. You can see which model and model multiplier are used by hovering over the chat response.

andrewstuart 3 hours ago

ChatGPT seems to be really good at analysis but continues to be bad at coding.

It constantly loses existing code. It the end of an intense coding session you’ll find your 20 feature application now has 3 features. That alone makes it not worth using. I can’t be bothered constantly working to identify and prevent losing existing features.

It is however very powerful for doing debugging and diagnostics with throwaway code.

It’s also useful to analyse stuff with ChatGPT and give that to Gemini or Claude to do the coding.

starstripe 2 hours ago

Doesn't Microsoft own a lot of OpenAI? Did they sell?

alberth 3 hours ago

Embarrassingly dumb question ...

is Claude Code just a plug-in for an existing editor. Or it is the entire editor itself?

theshrike79 3 hours ago

Claude Code is a CLI tool with light integration with VS Code
never_inline 3 hours ago

www.google.com

bartalama 5 hours ago

Claude Sonnet 4 is the best for generating code for me so far, albeit needing some investment in instruction files and prompt files when using GitHub Copilot.

typpilol 3 hours ago

Use the beast mode chat mode. I use it and it makes a world of difference
I tweaked mine a bit but still.

ChrisArchitect 6 hours ago

Actual post: https://code.visualstudio.com/blogs/2025/09/15/autoModelSele...

dev1ycan 4 hours ago

I use deepseek on the daily, the other day I tried to use claude and was surprised when after like 5 messages (had decent amount of code though) I got "limited"

aksss 2 hours ago

I have to say I prefer Claude for code and diagnosis. ChatGPT in CoPilot often replies with essentially, "these are the things I would be looking at if I wanted to troubleshoot this", to which my muttering frustration is, "yeah obviously - no sht - now do it.", whereas Claude just does it (e.g. looks through the code, provides analysis as to where the problem is or the nature of the problem, and returns a practical and immediately useful response). You could say that's a prompting problem (and you'd be right) but if Claude is lower drag than ChatGPT in this context and others, that's a value proposition.

However, I really do* prefer ChatGPT for generalized but as-applied conversation about architecture decisions, pros/cons of different approaches, patterns. Claude can do this too, but I do prefer ChatGPT for this.

daft_pink 6 hours ago

This needs that archive link that bypasses the paywall. I had to read it on my Apple News+ subscription to avoid the paywall.

bgarbiak 6 hours ago

https://archive.is/hO6YV
- daft_pink 6 hours ago
  
  Thanks. Do I just enter the url into the top in this website to generate myself for future items?
  - cwkirchner 5 hours ago
    
    You can enter URLs at the top of archive.is (or archive.ph) to see the archived version. Alternatively, there's a bookmarklet that you can use for archive.ph.
    https://mattdeco.github.io/archive-bookmarklet/

gigel82 6 hours ago

I don't think it's Microsoft that favors it. It's likely customers. Claude wipes the floor with all the GPTs in GitHub Copilot (in my experience).