AnotherGoodName 6 hours ago

Anyone had good luck using llms in a more advanced context?

I’ve tried simple things such as ‘explain general number field sieve giving simple numerical examples’ and had spectacular failures. The numerical examples give a few superficial runs of the sieve with a small number and then ‘well that didn’t work but imagine if it did…’. As in it can’t give functional numerical examples of well established theory. You can come up with examples yourself in 5mins of work. I can’t imagine it being at all useful in an even more advanced context.

  • hodgehog11 an hour ago

    As an applied probabilist, you often want to focus on the bigger proof strategy, but end up spending a lot of time with annoying integral computations and long strings of basic manipulations. For example, "compute an integral transform of X", "apply method A to get an asymptotic series expansion of Y", or "derive a concentration inequality for Z". I've found that it is often much faster to get e.g. o3-mini-high to do these first to verify if the answer might help in your proof, like a broader CAS replacement. Then you can go through the working yourself later once the strategy is clear; clean it up, make it rigorous, etc. Overall, this saves me a fair bit of time.

    Many still seem to believe that these models are completely unreliable for these sorts of arguments, but I've found the error rate to be around 10% per prompt, about on par with a good student. You get good at checking whether the argument is on track or not, so you can just rewrite the prompt a few times with more detail until a solution looks decent.

    I've never had much success with the types of prompts you describe, and I find comfort in knowing that these LLMs seem (at least at present) unable to solve broader research questions. Often, you need to give much more direct questions in the style of textbook questions so it doesn't get too "creative" or "lazy".

  • memhole 5 hours ago

    I’ve seen papers that show they can do math. There was one recent-ish one HN that showed an understanding of addition. I’m not convinced myself. At least the open weight models don’t seem to grasp integers. Doing conversions are typically a flop as well. My doubt really comes from how abstract mathematics really is. It’s entirely its own kind of terse language and symbolism. Maybe if there was the kind of focus on it like coding there would be better results?

  • neom 4 hours ago

    I know nothing about math, but I asked 4.1 your question and it spat out 14 pages, 14 pages of what, I have no clue: I have dyscalculia. https://s.h4x.club/lluby8bg

pogue 3 hours ago

What about the privacy aspects? They said nothing about whether or not they retain any of your data, store it permanently or use it to train their LLMs going forward.

fdb 8 hours ago

Hmm, it seems Claude Research is only available for the Max plan:

> Research is now available in early beta for Max, Team, and Enterprise plans in the United States, Japan, and Brazil. Simply toggle on the Research setting in chat.