What Is Entropy?

288 points by jfantl 3 months ago

TexanFeller 3 months ago

I don’t see Sean Carroll’s musings mentioned yet, so repeating my previous comment:

Entropy got a lot more exciting to me after hearing Sean Carroll talk about it. He has a foundational/philosophical bent and likes to point out that there are competing definitions of entropy set on different philosophical foundations, one of them seemingly observer dependent: - https://youtu.be/x9COqqqsFtc?si=cQkfV5IpLC039Cl5 - https://youtu.be/XJ14ZO-e9NY?si=xi8idD5JmQbT5zxN

Leonard Susskind has lots of great talks and books about quantum information and calculating the entropy of black holes which led to a lot of wild new hypotheses.

Stephen Wolfram gave a long talk about the history of the concept of entropy which was pretty good: https://www.youtube.com/live/ocOHxPs1LQ0?si=zvQNsj_FEGbTX2R3

infogulch 3 months ago

Half a year after that talk Wolfram appeared on a popular podcast [1] to discuss his book on the Second Law of Thermodynamics [2]. That discussion contained the best one-sentence description of entropy I've ever heard:
> Entropy is the logarithm of the number of states that are consistent with what you know about a system.
[1]: Mystery of Entropy FINALLY Solved After 50 Years? (Stephen Wolfram) - Machine Learning Street Talk Podcast - https://www.youtube.com/watch?v=dkpDjd2nHgo
[2]: The Second Law: Resolving the Mystery of the Second Law of Thermodynamics - https://www.amazon.com/Second-Law-Resolving-Mystery-Thermody...
- frank20022 3 months ago
  
  By that definition, the entropy of a game of chess decreases with time because as the game moves on there are less possible legal states. Did I get that right?
  - dist-epoch 3 months ago
    
    Is about subjective knowledge, not objective.
    So entropy is not related to the number of remaining legal states.
    If I know the seed of a PRNG, the entropy of the numbers it generates is zero for me. If I don't know the seed, it has very high entropy.
    https://www.quantamagazine.org/what-is-entropy-a-measure-of-...
    
    HelloNurse 3 months ago
    
    Low entropy in chess means effective prediction of future moves, which is far less useful than dedicating equivalent effort to choosing a good next move for ourselves.
  - infogulch 3 months ago
    
    Sure. Lots of games result in a reduction in game state entropy as the game progresses. Many card games could be described as unnecessarily complicated ways to sort a deck, as an example. When analyzing games wrt the Second Law, consider that "the system" is not simply the current game state, but should at least include captured pieces and human choices.
gsf_emergency 3 months ago

By Jeeves, it's rentropy!!
Sean and Stephen are absolutely thoughtful popularizers, but complexity, not entropy, is what they are truly interested in talking about.
Although it doesn't make complexity less scary, here's something Sean's been working on for more than a decade. The paper seems to be more accessible to the layman than he thinks..
https://arxiv.org/abs/1405.6903 https://scottaaronson.blog/?p=762
[When practitioners say "entropy", they mean RELATIVE ENTROPY, which is another can of worms.. rentropy is the one that is observer dependent: "That's Relative as in Relativity". Entropy by itself is simple, blame von Neumann for making it live rent-free]
https://en.wikipedia.org/wiki/Relative_entropy
@nyrikki below hints (too softly, imho) at this:
>You can also approach the property that people often want to communicate when using the term entropy as effective measure 0 sets, null cover, martingales, kolmogorov complexity, compressibility, set shattering, etc...

quietbritishjim 3 months ago

I like the axiomatic definition of entropy. Here's the introduction from Pattern Recognition and Machine Learning by C. Bishop (2006):

> The amount of information can be viewed as the ‘degree of surprise’ on learning the value of x. If we are told that a highly improbable event has just occurred, we will have received more information than if we were told that some very likely event has just occurred, and if we knew that the event was certain to happen we would receive no information. Our measure of information content will therefore depend on the probability distribution p(x), and we therefore look for a quantity h(x) that is a monotonic function of the probability p(x) and that expresses the information content. The form of h(·) can be found by noting that if we have two events x and y that are unrelated, then the information gain from observing both of them should be the sum of the information gained from each of them separately, so that h(x, y) = h(x) + h(y). Two unrelated events will be statistically independent and so p(x, y) = p(x)p(y). From these two relationships, it is easily shown that h(x) must be given by the logarithm of p(x) and so we have h(x) = − log2 p(x).

This is the definition of information for a single probabilistic event. The definition of entropy of a random variable follows from this by just taking the expectation.

dkislyuk 3 months ago

This is a great characterization of self-information. I would add that the `log` term doesn't just conveniently appear to satisfy the additivity axiom, but instead is the exact historical reason why it was invented in the first place. As in, the log function was specifically defined to find a family of functions that satisfied f(xy) = f(x) + f(y).
So, self-information is uniquely defined by (1) assuming that information is a function transform of probability, (2) that no information is transmitted for an event that certainly happens (i.e. f(1) = 0), and (3) independent information is additive. h(x) = -log p(x) is the only set of functions that satisfies all of these properties.
- diego898 3 months ago
  
  Thanks for this! I read this paper/derivation/justification once in grad school but I can’t now find the reference - do you have one?
tshaddox 3 months ago

According to my perhaps naive interpretation of that, the "degree of surprise" would depend on at least three things:
1. the laws of nature (i.e. how accurately do the laws of physics permit measuring the system and how determined are future states based on current states)
2. one's present understanding of the laws of nature
3. one's ability to measure the state of a system accurately and compute the predictions in practice
It strikes me as odd to include 2 and 3 in a definition of "entropy."
- tmalsburg2 3 months ago
  
  OP is talking about information entropy. Nature isn't relevant there.
  - tshaddox 3 months ago
    
    Surely laws of nature are still relevant since they (presumably) establish limits on how closely a system can be measured and which physical interactions can be simulated by computers (and how accurately).
overu589 3 months ago

How can that be axiomatic?
I offer a coherent, concise dissenting view.
Information is the removal of uncertainty. If it does not remove uncertainty it is not information. Uncertainty is state unresolved (potential resolves to state through constructive and destructive interference.)
Entropy is the existential phenomenon of potential distributing over the infinite manifold of negative potential. “Uncertainty.”
Emergence is a potential outcome greater than the capacity found in the sum of any parts.
Modern humanity’s erroneous extrapolations:
- asserting P>=0 without account that in existential reality 0 is the infinite expanse of cosmic void, thus the true mathematical description would be P>=-1
- confuse heat with entropy. Heat is the ultimate universal expression as heat is a product of all work and all existence is winding down (after all). Entropy directs thermodynamics, thermodynamics is not the extent of entropy.
- entropy is NOT the number of possible states in a system. Entropy is the distribution of potential; number of states are boundary conditions which uncalculated potential may reconfigure (the “cosmic ray” or murfy’s rule of component failure.) Existential reality is interference and decay.
- entropy is not “loss”. Loss is the entropy less work achieved.
- this business about “in a closed system “ is an example of how brilliant minds lie to themselves. No such thing exists anywhere accessible by Man. Even theoretically, the principles of decay and the “exogenous” influence of one impercieved influence over a “contained system.” Or “modeled system”, for one self deception is for the scientist or engineer to presume these speak for or on behalf of reality.
Emergence is the potential (the vector space of some capacity) “created” through some system of dynamics (work). “Some” includes the expressive space of all existential or theoretical reality. All emergent potential is “paid for” by burning available potential of some other kind. In nature the natural forces induce work in their extremes. In natural systems these design for the “mitigation of uncertainty” [soft form entropy], aka “intelligence.”
Entropy is the existential phenomenon of potential distributing over negative potential.
Information is the removal of uncertainty. If it does not remove uncertainty, it is not information. (And intelligence is the mitigation of uncertainty.)
Emergence is a potential outcome greater than the capacity found in the sum of any parts.
- canadianfella 3 months ago
  
  [dead]

nihakue 3 months ago

I'm not in any way qualified to have a take here, but I have one anyway:

My understanding is that entropy is a way of quantifying how many different ways a thing could 'actually be' and yet still 'appear to be' how it is. So it is largely a result of an observer's limited ability to perceive / interrogate the 'true' nature of the system in question.

So for example you could observe that a single coin flip is heads, and entropy will help you quantify how many different ways that could have come to pass. e.g. is it a fair coin, a weighted coin, a coin with two head faces, etc. All these possibilities increase the entropy of the system. An arrangement _not_ counted towards the system's entropy is the arrangement where the coin has no heads face, only ever comes up tails, etc.

Related, my intuition about the observation that entropy tends to increase is that it's purely a result of more likely things happening more often on average.

Would be delighted if anyone wanted to correct either of these intuitions.

fsckboy 3 months ago

>purely a result of more likely things happening more often on average
according to your wording, no. if you have a perfect six sided die (or perfect two sided coin), none/neither of the outcomes are more likely at any point in time... yet something approximating entropy occurs after many repeated trials. what's expected to happen is the average thing even though it's never the most likely thing to happen.
you want to look at how repeated re-convolution of a function with itself always converges on the same gaussian function, no matter the shape of the starting function is (as long as it's not some pathological case, such as an impulse function... but even then, consider the convolution of the impulse function with the gaussian)
tshaddox 3 months ago

> My understanding is that entropy is a way of quantifying how many different ways a thing could 'actually be' and yet still 'appear to be' how it is. So it is largely a result of an observer's limited ability to perceive / interrogate the 'true' nature of the system in question.
When ice cubes in a glass of water slowly melt, and the temperature of the liquid water decreases, where does the limited ability of an observer come into play?
It seems to me that two things in this scenario are true:
1) The fundamental physical interactions (i.e. particle collisions) are all time-reversible, and no observer of any one such interaction would be able to tell which directly time is flowing.
2) The states of the overall system are not time-reversible.
- CaptainNegative 3 months ago
  
  The temperature of an object is a macroscopic property basically depending on the kinetic energy of the matter within it, which in a typical cup of water varies substantially from one molecule to the next. If before you could guess a little bit about the kinetic energy of a given water molecule based on whether it is part of the ice or not, after melting and sufficient time to equilibrate the location of a particular molecule gives you no additional information for estimating its velocity.
- dynm 3 months ago
  
  It's tricky when you think of a continuous system because the "differential entropy" is different (and more subtle) than the "entropy". Even if a system is time-reversible, the "measure" of a set of states can change.
  For example: Say I'm at some distance from you, between 0 and 1 km (all equiprobable). Now I switch to being 10x as far away. This is time-reversible, but because the volume of the set of states changed, the differential entropy changes. This is the kind of thing that happens in time-reversible continuous systems that can't happen in time-reversible discrete systems.
  - im3w1l 3 months ago
    
    Isn't that kind of what we want entropy to capture though? If a particle darts off into the distance then in theory it might be time reversible, but in practice it's not so simple. If the particle escapes the gravitational pull, the only way it can come back is if it bumps into some other object and pushes that object away. So things will inevitably spread out more and more creating an arrow of time.
    This can then be related to the big bang, and maybe it could be said that we are all living of the negentropy from that event and the subsequent expansion.
    Getting different entropy values based on choice of units is a very nasty property though. It kinda hints that there is one canonical correct unit (plank length?)
  - jdhwosnhw 3 months ago
    
    I have yet to see differential entropy used successfully (beyond its explicitly constructed-for purpose for calculating channel capacity). Similar to your thought experiment is the issue that the differential entropy value depends on your choice of unit system. Fundamentally the issue is that you cant stick a quantity with units into a transcendental function and get meaningful results out
    
    dynm 3 months ago
    
    Yeah, it's quite disturbing that the differential entropy (unlike the discrete entropy) depends on the units. Even worse, the differential entropy can be negative!
    Interestingly, the differential KL-divergence (differential cross-entropy - differential entropy) doesn't seem to have any of these problems.
- thawtxp 3 months ago
  
  [dead]
russdill 3 months ago

This is based on entropy being closely tied to your knowledge of the system. It's one of many useful definitions of entropy.
867-5309 3 months ago

> 'actually be' and yet still 'appear to be'
esse quam videri

asdf_snar 3 months ago

I throw these quotes by Y. Oono into the mix because they provide viewpoints which are in some tension with those who take -\sum_x p(x) log p(x) definition of entropy as fundamental.

> Boltzmann’s argument summarized in Exercise of 2.4.11 just derives Shannon’s formula and uses it. A major lesson is that before we use the Shannon formula important physics is over.

> There are folklores in statistical mechanics. For example, in many textbooks ergodic theory and the mechanical foundation of statistical mechanics are discussed even though detailed mathematical explanations may be missing. We must clearly recognize such topics are almost irrelevant to statistical mechanics. We are also brainwashed that statistical mechanics furnishes the foundation of thermodynamics, but we must clearly recognize that without thermodynamics statistical mechanics cannot be formulated. It is a naive idea that microscopic theories are always more fundamental than macroscopic phenomenology.

sources: http://www.yoono.org/download/inst.pdf http://www.yoono.org/download/smhypers12.pdf

xavivives 3 months ago

Over the last few months, I've been developing an unorthodox perspective on entropy [1] . It defines the phenomenon in much more detail, allowing for a unification of all forms of entropy. It also defines probability through the same lens.

I define both concepts fundamentally in relation to priors and possibilities:

- Entropy is the relationship between priors and ANY possibility, relative to the entire space of possibilities.

- Probability is the relationship between priors and a SPECIFIC possibility, relative to the entire space of possibilities.

The framing of priors and possibilities shows why entropy appears differently across disciplines like statistical mechanics and information theory. Entropy is not merely observer-dependent, but prior-dependent. Including priors not held by any specific observer but embedded in the framework itself. This helps resolve the apparent contradiction between objective and subjective interpretations of entropy.

It also defines possibilities as constraints imposed on an otherwise unrestricted reality. This framing unifies how possibility spaces are defined across frameworks.

[1]: https://buttondown.com/themeaninggap/archive/a-unified-persp...

3abiton 3 months ago

I am curious why the word "entropy" encompasses so many concepts? Wouldn't it have made sense to just give each concept a different word?
- namaria 3 months ago
  
  Yes. There are different concepts called 'entropy', sometimes merely because their mathematical formulation looks very similar.
  It means different things in different contexts and an abstract discussion of the term is essentially meaningless.
  Even discussions within the context of the second law of thermodynamics are often misleading because people ignore much of the context in which the statistical framing of the law was formulated. Formal systems and all that... These are not general descriptions of how nature works, but formal systems definitions that allow for some calculations.
  I find the study of symmetries by Noether much more illuminating in general than trying to generalize conservation laws as observed within certain formal models.
- prof-dr-ir 3 months ago
  
  Whenever there is an entropy, it can be defined as
  S = - sum_n p_n log( p_n )
  where the p_n is a probability distribution: for n = 1...W, p_n >= 0 and sum_n p_n = 1. This is always the underlying equation, the only thing that changes is the probability distribution.

glial 3 months ago

One thing that helped me was the realization that, at least as used in the context of information theory, entropy is a property of an individual (typically the person receiving a message) and NOT purely of the system or message itself.

> entropy quantifies uncertainty

This sums it up. Uncertainty is the property of a person and not a system/message. That uncertainty is a function of both a person's model of a system/message and their prior observations.

You and I may have different entropies about the content of the same message. If we're calculating the entropy of dice rolls (where the outcome is the 'message'), and I know the dice are loaded but you don't, my entropy will be lower than yours.

ninetyninenine 3 months ago

Not true. The uncertainty of the dice rolls is not controlled by you. It is the property of the loaded dice itself.
Here's a better way to put it. If I roll the dice infinite times. The uncertainty of the outcome of the dice will become evident in the distribution of the outcomes of the dice. Whether you or another person is certain or uncertain of this does not indicate anything.
Now when you realize this you'll start to think about this thing in probability called frequentists vs. bayesian and you'll realize that all entropy is, is a consequence of probability and that the philosophical debate in probability applies to entropy as well because they are one and the same.
I think the word "entropy" confuses people into thinking it's some other thing when really it's just probability at work.
- glial 3 months ago
  
  I concede that my framing was explicitly Bayesian, but with that caveat, it absolutely is true: your uncertainty is a function of your knowledge, which is a model of the world, but is not equivalent to the world itself.
  Suppose I had a coin that only landed on heads. You don't know this and you flip the coin. According to your argument, for the first flip, your entropy about the outcome of the flip is zero. However, you wouldn't be able to tell me which way the coin would land, making your entropy nonzero. This is a contradiction.
  - nyrikki 3 months ago
    
    To add to this.
    Both the Bayesian vs frequentist interpretations make understanding the problem challenging, as both are powerful interpretations to find the needle in the haystack, when the problem is finding the hay in the haystack.
    A better lens is that a recursive binary sequence (coin flips) is an algorithmically random sequence if and only if it is a Chaitin's number.[1]
    Chaitin's number is normal, which is probably easier understood with decimal digits meaning that with any window size, over time the distribution, the distribution of 0-9 will be the same.
    This is why HALT ≈ open frame ≈ system identification ≈ symbol grounding problems.
    Probabilities are very powerful for problems like The dining philosophers problem or the Byzantine generals problem, they are still grabbing needles every time they reach into the hay stack.
    Pretty much any almost all statement is a hay in the haystack problem. For example almost all real numbers are normal, but we have only found a few.
    We can construct them, say with .101010101 in base 2 .123123123123 in base 3 etc...but we can't access them.
    Given access to the true reals, you have 0 percent chance of picking a computable number, rational, etc... but a 100% chance of getting a normal number or 100% chance of getting an uncomputable number.
    Bayesian vs frequentist interpretations allow us to make useful predictions, but they are the map, not the territory.
    Bayesian iid data and Frequentist iid random variables play the exact similar roles Enthalpy, Gibbs free energy, statistical entropy, information theory entropy, Shannon Entropy etc...
    The difference between them is the independent variables that they depend on and the needs of the model they are serving.
    You can also approach the property that people often want to communicate when using the term entropy as effective measure 0 sets, null cover, martingales, kolmogorov complexity, compressibility, set shattering, etc...
    As a lens, null cover is most useful in my mind, as a random real number should not have any "uncommon" properties, or look more like the normal reals.
    This is very different from statistical methods, or any effective usable algorithm/program, which absolutely depend on "uncommon" properties.
    Which is exactly the hay in the problem of finding the hay haystack problem, hay is boring.
    [1]https://www.cs.auckland.ac.nz/~cristian/samplepapers/omegast...
- bloppe 3 months ago
  
  Probability is subjective though, because macrostates are subjective.
  The notion of probability relies on the notion of repeatability: if you repeat a coin flip infinite times, what proportion of outcomes will be heads, etc. But if you actually repeated the toss exactly the same way every time, say with a finely-tuned coin-flipping machine in a perfectly still environment, you would always get the same result.
  We say that a regular human flipping a coin is a single macrostate that represents infinite microstates (the distribution of trajectories and spins you could potentially impart on the coin). But who decides that? Some subjective observer. Another finely tuned machine could conceivably detect the exact trajectory and spin of the coin as it leaves your thumb and predict with perfect accuracy what the outcome will be. According to that machine, you're not repeating anything. You're doing a new thing every time.
  - canjobear 3 months ago
    
    Probability is a bunch of numbers that add to 1. Sometimes you can use this to represent subjective beliefs. Sometimes you can use it to represent objectively existing probability distributions. For example, an LLM is a probability distribution on a following token given previous tokens. If two "observers" disagree about an LLM's probability assigned to some token, then only at most one of them can be correct. So the probability is objective.
    
    bloppe 3 months ago
    
    We're talking about 2 different things. I agree that probability is objective as long as you've already decided on the definition of the macrostate, but that definition is subjective.
    From an LLM's perspective, the macrostate is all the tokens in the context window and nothing more. A different observer may be able to take into account other information, such as the identity and mental state of the author, giving rise to a different distribution. Both of these models can be objectively valid even though they're different, because they rely on different definitions of the macrostate.
    It can be hard to wrap your head around this, but try taking it to the extreme. Let's say there's an omniscient being that knows absolutely everything there is to know about every single atom within a system. To that observer, probability does not exist, because every macrostate represents a single microstate. In order for something to be repeated (which is core to the definition of probability), it must start from the exact same microstate, and thus always have the same outcome.
    You might think that true randomness exists at the quantum level and that means true omniscience is impossible (and thus irrelevant), but that's not provable and, even if it were true, would not invalidate the general point that probabilities are determined by macrostate definition.
    
    canjobear 3 months ago
    
    Suppose you're training a language model by minimizing cross entropy, and the omniscient being is watching. In each step, your model instantiates some probability distribution, whose gradients are computed. That distribution exists, and is not deterministic to the omniscient entity.
    
    bloppe 3 months ago
    
    An LLM is given a definition of the macrostate which creates the probability distribution, but a different definition of the macrostate (such as would be known to the omniscient being) would create a different distribution. According to the omniscient entity, the vast majority of long combinations of tokens would have zero probability because nobody will ever write them down in that order. The infinite monkey theorem is misleading in this regard. The odds of producing Shakespeare's works completely randomly before the heat death of the universe are practically zero, even if all the computing power in the world were dedicated to the cause.
    
    kgwgk 3 months ago
    
    What’s non deterministic there?
    That “probability distribution” is just a mathematical function assigning numbers to tokens, defined using a model that the person creating the model and the omniscent entity know, applying a set of deterministic mathematical functions to a sequence of observed inputs that the person creating the model and the omniscent entity also know.
    
    kgwgk 3 months ago
    
    > If two "observers" disagree about an LLM's probability assigned to some token, then only at most one of them can be correct.
    The observer who knows the implementation in detail and the state of the pseudo-random number generator can predict the next token with certainty. (Or almost certainty, if we consider flip-switching cosmic rays, etc.)
    
    canjobear 3 months ago
    
    That’s the probability to observe a token given the prompt and the seed. The probability assigned to a token given the prompt alone is a separate thing, which is objectively defined independent of any observer and can be found by reading out the model logits.
    
    kgwgk 3 months ago
    
    Yes, that’s a purely mathematical abstract concept that exists outside of space and time. The labels “objective” and “subjective” are usually used to talk about probabilities in relation to the physical world.
    
    canjobear 3 months ago
    
    An LLM distribution exists in the physical world, just as much as this comment does. It didn’t exist before the model was trained. It has relation to the physical world: it assigns probabilities to subword units of text. It has commercial value that it wouldn’t have if its objective probability values were different.
    
    kgwgk 3 months ago
    
    > It has relation to the physical world: it assigns probabilities to subword units of text.
    How is that probability assignment linked to the physical world exactly? In the physical world the computer will produce a token. You rejected before that it was about predicting the token that would be produced.
    
    kgwgk 3 months ago
    
    Or maybe you mean that the probability assignments are not about the output of a particular LLM implementation in the real world but about subword units of text in the wild.
    In that case how could two different LLMs do different assigments to the same physical world without being wrong? Would they be “objective” but unrelated to the “object”?
- quietbritishjim 3 months ago
  
  You're right it reduces to Bayesian vs frequentist views of probability. But you seem to be taking an adamantly frequentist view yourself.
  Imagine you're not interested in whether a dice is weighted (in fact assume that it is fair in every reasonable sense), but instead you want to know the outcome of a specific roll. What if that roll has already happened, but you haven't seen it? I've cheekily covered up the dice with my hand straight after I rolled it. It's no longer random at all, in at least some philosophical points of view, because its outcome is now 100% determined. If you're only concerned about "the property of the dice itself" are you now only concerned with the property of the roll itself? It's done and dusted. So the entropy of that "random variable" (which only has one outcome, of probability 1) is 0.
  This is actually a valid philosophical point of view. But people that act as though the outcome is still random, allow themselves to use probability theory as if it hadn't been rolled yet, are going to win a lot more games of chance than those that refuse to.
  Maybe this all seems like a straw man. Have I argued against anything you actually said in your post? Yes I have: your core disagreement with OP's statement "entropy is a property of an individual". You see, when I covered up the dice with my hand, I did see it. So if you take the Bayesian view of probability and allow yourself to consider that dice roll probabilistically, then you and I really do have different views about the probability distribution of that dice roll and therefore the entropy. If I tell a third person, secretly and honestly, that the dice roll is even then they have yet another view of the entropy of the same dice roll! All at the same time and all perfectly valid.
Geee 3 months ago

It's both. The system or process has it's actual entropy, and the sequence of observations we make has a certain entropy. We can say that "this sequence of numbers has this entropy", which is slightly different from the entropy of the process which created the numbers. For example, when we make more coin tosses, our sequence of observations has an entropy which gets closer and closer to the actual entropy of the coin.
empath75 3 months ago

> If we're calculating the entropy of dice rolls (where the outcome is the 'message'), and I know the dice are loaded but you don't, my entropy will be lower than yours.
That's got nothing to do with entropy being subjective. If 2 people are calculating any property and one of them is making a false assumption, they'll end up with a different (false) conclusion.
- mitthrowaway2 3 months ago
  
  What if I told you the dice were loaded, but I didn't tell you which face they were loaded in favor of?
  Then you (presumably) assign a uniform probability over one true assumption and five false assumptions. Which is the sort of situation where subjective entropy seems quite appropriate.
- glial 3 months ago
  
  Entropy is based on your model of the world and every model, being a simplification and an estimate, is false.
- canjobear 3 months ago
  
  > If 2 people are calculating any property and one of them is making a false assumption, they'll end up with a different (false) conclusion.
  This implies that there is an objectively true conclusion. The true probability is objective.
  - mitthrowaway2 3 months ago
    
    Ok. I rolled a die and the result was 5. What should the true objective probability have been for the outcome of that roll?
pharrington 3 months ago

Are you basically just saying "we're not oracles"?

hatthew 3 months ago

I'm not sure I understand the distinction between "high-entropy macrostate" and "order". Aren't macrostates just as subjective as order? Let's say my friend's password is 6dVcOgm8. If we have a system whose microstate consists of an arbitrary string of alphanumeric characters, and the system arranges itself in the configuration 6dVcOgm8, then I would describe the macrostate as "random" and "disordered". However, if my friend sees that configuration, they would describe the macrostate as "my password" and "ordered".

If we see another configuration M2JlH8qc, I would say that the macrostate is the same, it's still "random" and "unordered", and my friend would agree. I say that both macrostates are the same: "random and unordered", and there are many microstates that could be called that, so therefore both are microstates representing the same high-entropy macrostate. However, my friend sees the macrostates as different: one is "my password and ordered", and the other is "random and unordered". There is only one microstate that she would describe as "my password", so from her perspective that's a low-entropy macrostate, while they would agree with me that M2JlH8qc represents a high-entropy macrostate.

So while I agree that "order" is subjective, isn't "how many microstates could result in this macrostate" equally subjective? And then wouldn't it be reasonable to use the words "order" and "disorder" to count (in relative terms) how many microstates could result in the macrostate we subjectively observe?

vzqx 3 months ago

I think you need to rigorously define your macrostates. If your two states are "my friend's password" and "not my friend's password" then the macrostates are perfectly objective. You don't know what macrostate the system is in, but that doesn't change the fact that the system is objectively in one of those two macrostates.
If you define your macrostates using subjective terms (e.g. "a string that's meaningful to me" or "a string that looks ordered to me") then yeah, your entropy calculations will be subjective.
- anon84873628 3 months ago
  
  That's better than how I was going to say it:
  In one case you're looking at the system as "alphanumeric string of length N." In another, the system is that plus something like "my friend's opinion on the string".
  Also, as the article says, using "entropy" to mean "order" is not a good practice. "Order" is a subjective concept, and some systems (like oil and water separating) look more "ordered" but still have higher entropy, because there is more going on energetically than we can observe.
- hatthew 3 months ago
  
  I guess part of my question is, are there any macrostates that are useful to us that can't be described using more abstract human-subjective terms? If a macrostate can be described using human terms, I'd say the state is somewhat ordered. And if a state can't be described using human terms, then wouldn't it be indistinguishable from "particle soup" and thus not a useful macrostate to talk about?

IIAOPSW 3 months ago

Its the name for the information bits you don't have.

More elaborately, its the number bits needed to fully specify something which is known to be in some broad category of state but the exact details to calculate it are unknown.

voidhorse 3 months ago

I think this is a pretty good introduction but it gets a little bogged down in the binary encoding assumption, which is an extraneous detail. It does help to know why the logarithm is chosen as a measure of information though regardless of base, once you know that "entropy" is straightforward. I'd agree that much of the difficulty arises from the uninformative name and the various mystique it carries.

To try to expand on the information measure part from a more abstract starting point: Consider a probability distribution, some set of probabilities p. We can consider it as indicating our degree of certainty about what will happen. In an equiprobable distribution, e.g. a fair coin flip (1/2, 1/2) there is no skew either which way, we are admitting that we basically have no reason to suspect any particular outcome. Contrarily, in a split like (1/4, 3/4) we are stating that we are more certain that one particular outcome will happen.

If you wanted to come up with a number to represent the amount of uncertainty, it's clear that the number should be higher the closer the distribution is to being completely equiprobable (1/2, 1/2)—complete lack of certainty about the result, and the number should be smallest when we are 100% certain (0, 1).

This means that the function has to be an order inversion on the probability values—that is I(1) = 0 (no uncertainty). The logarithm, to arbitrary base (selecting a base is just a change of units) has this property under the convention that I(0) = inf (that is, a totally improbable event carries infinite information—after all, an impossibility occurring would in fact be the ultimate surprise).

Entropy is just the average of this function taken over the probability values (multiply each probability in the distribution by the log of the inverse of the probabilities and sum them). In info theory you also usually assume the probabilities are independent, and so the further condition that I(pq) = I(p) + I(q) is also stipulated.

karpathy 3 months ago

What I never fully understood is that there is some implicit assumption about the dynamics of the system. So what that there are more microstates of some macrostate as far as counting is concerned? We also have to make assumptions about the dynamics, and in particular about some property that encourages mixing.

tomnicholas1 3 months ago

Yes, that assumption is called the Ergodic Hypothesis, and generally justified in undergraduate statistical mechanics courses by proving and appealing to Liouville's theorem.
[1] https://en.wikipedia.org/wiki/Ergodic_hypothesis
- vitus 3 months ago
  
  It's worth noting that there's more than just ergodicity at play, although that's a fundamental requirement. For instance, applying the Pauli Exclusion Principle gives rise to Fermi-Dirac statistics.
  - tomnicholas1 3 months ago
    
    Isn't that more about enumerating the microstates? The Pauli exclusion principle just ends up forbidding some of the microstates (forbidding a significant fraction of them if you're in the low-temperature regime).
    
    vitus 3 months ago
    
    It is about enumerating the microstates, but in a way that takes into account how the particles interact with each other (aka making assumptions about the dynamics).
    If we didn't take into account any interactions, we'd be unable to do anything with statistical mechanics beyond rederiving the ideal gas law.
oh_my_goodness 3 months ago

In equilibrium we don't have to make an assumption about the dynamics or the mixing. We just expect to see the most probable state when we measure.
It's interesting to try to show that the time average equals the ensemble average. It's very cool to think about the dynamics. That stuff must be happening. But those extra ideas aren't necessary for applying the equilibrium theory.

tsimionescu 3 months ago

This goes through all definitions of entropy, except the very first one, which is also the one that is in fact measurable and objective: the variation in entropy is the amount of heat energy that the system exchanges with the environment at a given temperature during a reversible process. While tedious, this can be measured, and it doesn't depend on any subjective knowledge about the system. Any two observers will agree on this value, even if one knows all of the details of every single microstate.

anon84873628 3 months ago

Nitpick in the article conclusion:

>Heat flows from hot to cold because the number of ways in which the system can be non-uniform in temperature is much lower than the number of ways it can be uniform in temperature ...

Should probably say "thermal energy" instead of "temperature" if we want to be really precise with our thermodynamics terms. Temperature is not a direct measure of energy, rather it is an extensive property describing the relationship between change in energy to change in entropy.

johan_felisaz 3 months ago

Nitpick of the nitpick... Temperature is actually an intensive quantity, i.e. combining two subsystems with the same temperature yields a bigger system with the same temperature, not twice bigger.
- timewizard 3 months ago
  
  This is why the "thermodynamic beta" is really useful.
  https://en.wikipedia.org/wiki/Thermodynamic_beta
- anon84873628 3 months ago
  
  D'oh! Thanks for the correction
kgwgk 3 months ago

I think you used “extensive” in the sense of “defined for the whole system and not locally”. It’s true that thermodynamics is about systems at equilibrium.
- anon84873628 3 months ago
  
  I meant to say "intensive" in the physics sense but just brain farted while typing.
  - kgwgk 3 months ago
    
    Ah, then I don’t see what’s wrong with “the number of ways in which the system can be non-uniform in temperature is much lower than the number of ways it can be uniform in temperature”. In equilibrium one doesn’t have a gradient of temperature because “…” indeed.
    
    anon84873628 3 months ago
    
    If you take "temperature" to mean "average kinetic energy of molecules" then it's fine. But that's sort of the same class of simplification as saying "entropy is the amount of disorder".
    
    kgwgk 3 months ago
    
    I don't follow you. Whatever you take temperature to mean, for an isolated system in equilibrium that intensive thermodynamic property will have the same value everywhere and the entropy of the system will thus be maximized given the constraints.
    If you put two subsystems at different temperatures in thermal contact the combined system will be in equilibrium only when the cold one warms up and the hot one cools down. The increase in the entropy of the first is larger than the decrease in the entropy of the second (because ΔQ/T1 > ΔQ/T2 when T1<T2) and the total entropy increases.
    No kinetic energies of molecules are involved in that phenomenological description of heat flowing from hot to cold.

brummm 3 months ago

I love that the author clearly describes why saying entropy measures disorder is misleading.

bargava 3 months ago

Here is a good overview on Entropy [1]

[1] https://arxiv.org/abs/2409.09232

perihelions 3 months ago

Here's the HN thread about that overview on Entropy,
https://news.ycombinator.com/item?id=41037981 ("What Is Entropy? (johncarlosbaez.wordpress.com)" — 209 comments)

marojejian 3 months ago

This is the best description of entropy and information I've read: https://arxiv.org/abs/1601.06176

Most of all, it highlights the subjective / relative foundations of these concepts.

Entropy and Information only exist relative to a decision about the set of state an observer cares to distinguish.

It also caused me to change my informal definition of entropy from a negative ("disorder)" to a more positive one ("the number of things I might care to know")

The Second Law now tells me that the number of interesting things I don't know about is always increasing!

This thread inspired me to post it here: https://news.ycombinator.com/item?id=43695358

dswilkerson 3 months ago

Entropy is expected information. That is, given a random variable, if you compute the expected value (the sum of the values weighted by their probability) of the information of an event (the log base 2 of the multiplicative inverse of the probability of the event), you get the formula for entropy.

Here it is explained at length: "An Intuitive Explanation of the Information Entropy of a Random Variable, Or: How to Play Twenty Questions": http://danielwilkerson.com/entropy.html

bowsamic 3 months ago

I didn’t read in depth but it seems to me on first glance (please correct me if I’m wrong) but as with all articles on entropy this seems to explain everything but the classical thermodynamic quantity called entropy which is 1. the quantity to which all these others are chosen to be related to and 2. the one that is by far the most difficult to explain intuitively

Information and statistical explanations of entropy are very easy. The real question is, what does entropy mean in the original context that it was introduced in, before those later explanations?

im3w1l 3 months ago

So here is an amusing thought experiment I thought of at one point.

Imagine a very high resolution screen. Say a billion by a billion pixels. Each of them can be white, gray or black. What is the lowest entropy possible? Each of the pixels has the same color. How does the screen look? Gray. What is the highest entropy possible? Each pixel has a random color. How does it look from a distance? Gray again.

What does this mean? I have no idea. Maybe nothing.

Also sorry for writing two top level comments, but I just really care about this topic

flanked-evergl 3 months ago

Not sure what the point of this article, it seems to focus on confusion which could be cleared up with a simple visit to wikipedia.

> But I have no idea what entropy is, and from what I find, neither do most other people.

The article does not go on to explain what entropy is, it just tries to explain away some hypothetical claims about entropy which as far as we can tell do hold, and does not explain why, if they were wrong, they do in fact hold.

im3w1l 3 months ago

As a kid I wanted to invent a perpetuum mobile. From that perspective, entropy is that troublesome property that prevents a perpetuum mobile of the second kind. And any fuzziness or ambiguity in its definition is a glimmer of hope that we may yet find a loop hole.

jwilber 3 months ago

There’s an interactive visual of Entropy here in the Where To Partition section (midway thru the article): https://mlu-explain.github.io/decision-tree/

jwarden 3 months ago

Here's my own approach to explaining entropy as a measure of uncertainty: https://jonathanwarden.com/entropy-as-uncertainty

FilosofumRex 3 months ago

Boltzmann and Gibbs turn in their graves, every time some information theorist mutilates their beloved entropy. Shanon & Von Neumann were hacking a new theory of communication, not doing real physics and never meant to equate thermodynamic concepts to encoding techniques - but alas now dissertations are written on it.

Entropy can't be a measure of uncertainty, because all the uncertainty is in the probability distribution p(x) - multiplying it with its own logarithm and summing doesn't tell us anything new. If it did, it'd violate quantum physics principles including the Bell inequality and Heisenberg uncertainty.

The article never mentions the simplest and most basic definition of entropy, ie its units (KJ/Kelvin), nor the 3rd law of thermodynamics which is the basis for its measurement.

“Every physicist knows what entropy is. Not one can write it down in words.” Clifford Truesdell

kgwgk 3 months ago

> Entropy can't be a measure of uncertainty
Gibbs’ entropy is derived from “the probability that an unspecified system of the ensemble (i.e. one of which we only know that it belongs to the ensemble) will lie within the given limits” in phase space. That’s the “coefficient of probability” of the phase, its logarithm is the “index of probability” of the phase, the average of that is the entropy.
Of course the probability distribution corresponds to the uncertainty. That’s why the entropy is defined from the probability distribution.
Your claim sounds like saying that the area of a polygon cannot be a measure of its extension because the extension is given by the shape and calculating the area doesn’t tell us anything new.
kgwgk 3 months ago

> Shanon & Von Neumann were hacking a new theory of communication, not doing real physics
Maybe I’m misunderstanding the reference to von Neumann but his work on entropy was about physics, not about communication.
- nanna 3 months ago
  
  Think the parent has confused Von Neumann with Wiener. They've also misspelled Shannon.
- FilosofumRex 3 months ago
  
  More precisely, Von Neumann was extending Shannon's information theoretic entropy to quantum channels, which he restated as S(p)=Tr(p ln(p)) - Again showing that information theoretic entropy reveals nothing more about a system than its probability distribution density matrix p.
  - kgwgk 3 months ago
    
    It’s quite remarkable that in his 1927 paper “The thermodynamics of quantum-mechanical ensembles” von Neumann was extending the mathematical theory of communication that Shannon - who was 11 at the time - would only publish decades later.
    
    FilosofumRex 3 months ago
    
    Dude, read your own reference... There is no mention of information or communication theory anywhere in his 1927 paper or 1932 book. Young Von Nuemann was doing real physics extending and updating Gibb's entropy.
    OTOH, old Von Neumann was wealthy, hobnobbing with politicians and glitterati musing about life, biology, econ and anything else that would amuse his social circles. "Entropy", as he's alleged to have told Shannon, was his ace in the pocket to win arguments.
    Formal similarity with Shannon's entropy is superfluous and conveys no new information about any system, quantum or otherwise. But it does make for lot's PhD dissertations, for exactly the same reason Von Nuemann stated.
    
    kgwgk 3 months ago
    
    > There is no mention of information or communication theory anywhere in his 1927 paper or 1932 book. Young Von Nuemann was doing real physics extending and updating Gibb's entropy.
    We agree then! John von Neumann’s work on entropy was about physics, not about communication theory. S(p)=Tr(p ln(p)) is physics. If you still claim that he “was extending Shannon's information theoretic entropy to quantum channels” at some point could you maybe give a reference?
    > Formal similarity with Shannon's entropy is superfluous and conveys no new information about any system, quantum or otherwise
    What I still don’t understand is your fixation with that.
    “Entropy can't be a measure of uncertainty, because all the uncertainty is in the probability distribution p(x)” makes zero sense given that the entropy is a property of the probability distribution. (Any measure of “all the uncertainty” which is “in the probability distribution p(x)” will be a property of p(x). The entropy checks that box so why can’t it be a measure of uncertainty?)
    It is a measure of the uncertainty in the probability distribution that describes a physical system in statistical mechanics. It is a measure of the lack of knowledge about the system. For a quantum system, von Neumann’s entropy becomes zero when the density matrix corresponds to a pure state and there is nothing left to know.

Ono-Sendai 3 months ago

Anyone else notice how the entropy in the 1000 bouncing balls simulation goes down at some point, thereby violating the second law of thermodynamics? :)

thowawatp302 3 months ago

Over long enough scales there is no conservation of energy because the universe does not have temporal symmetry.
- kgwgk 3 months ago
  
  Cosmologists are not serious people.
  https://math.ucr.edu/home/baez/physics/Relativity/GR/energy_...
  “Is Energy Conserved in General Relativity?”
  “In special cases, yes. In general, it depends on what you mean by "energy", and what you mean by "conserved".”

gozzoo 3 months ago

The visualisation is great, the topic is interesting and very well explained. Can sombody recomend some other blogs with similar type of presentation?

floxy 3 months ago

If you haven't seen it, you'll probably like:
https://ciechanow.ski/archives/

fedeb95 3 months ago

given all the comments, it turns out that a post on entropy has high entropy.

vitus 3 months ago

The problem with this explanation (and with many others) is that it misses why we should care about "disorder" or "uncertainty", whether in information theory or statistical mechanics. Yes, we have the arrow of time argument (second law of thermodynamics, etc), and entropy breaks time-symmetry. So what?

The article hints very briefly at this with the discussion of an unequally-weighted die, and how by encoding the most common outcome with a single bit, you can achieve some amount of compression. That's a start, and we've now rediscovered the idea behind Huffman coding. What information theory tells us is that if you consider a sequence of two dice rolls, you can then use even fewer bits on average to describe that outcome, and so on; as you take your block length to infinity, your average number of bits for each roll in the sequence approaches the entropy of the source. (This is Shannon's source coding theorem, and while entropy plays a far greater role in information theory, this is at least a starting point.)

There's something magical about statistical mechanics where various quantities (e.g. energy, temperature, pressure) emerge as a result of taking partial derivatives of this "partition function", and that they turn out to be the same quantities that we've known all along (up to a scaling factor -- in my stat mech class, I recall using k_B * T for temperature, such that we brought everything back to units of energy).

https://en.wikipedia.org/wiki/Partition_function_(statistica...

https://en.wikipedia.org/wiki/Fundamental_thermodynamic_rela...

If you're dealing with a sea of electrons, you might apply the Pauli exclusion principle to derive Fermi-Dirac statistics that underpins all of semiconductor physics; if instead you're dealing with photons which can occupy the same energy state, the same statistical principles lead to Bose-Einstein statistics.

Statistical mechanics is ultimately about taking certain assumptions about how particles interact with each other, scaling up the quantities beyond our ability to model all of the individual particles, and applying statistical approximations to consider the average behavior of the ensemble. The various forms of entropy are building blocks to that end.

alex5207 3 months ago

Super read! Thanks for sharing

sysrestartusr 3 months ago

at some point my take became: if nothing orders the stuff that lies and flies around, any emergent structures that follow the laws of nature eventually break down.

organisms started putting things in places to increase "survivability" and thriving of themselves until the offspring was ready for the job at which point the offspring started to additionaly put things in place for the sake of the "survivability" and thriving of their ancestors ( mostly overlooking their nagging and shortcomings because "love" and because over time, the lessons learned made everything better for all generations ) ...

so entropy is only relevant if all the organisms that can put some things in some place for some reason disappear and the laws of nature run until new organisms emerge. ( which is why I'm always disappointed at leadership and all the fraudulent shit going on ... more pointlessly dead organisms means less heads that can come up with ways to put things together in fun and useful ways ... it's 2025, to whomever it applies: stop clinging to your sabotage-based wannabe supremacy, please, stop corrupting the law, for fucks sake, you rich fucking losers )

nanna 3 months ago

Yet another take on entropy and information focused on Claude Shannon and lacking even a single mention of Norbert Wiener, even though they invented it simultaneously and evidence suggests Shannon learned the idea from Wiener.

NitroPython 3 months ago

Love the article, my mind is bending but in a good way lol

timonoko 3 months ago

[flagged]

DadBase 3 months ago

My old prof taught entropy with marbles in a jar and cream in coffee. “Entropy,” he said, “is surprise.” Then he microwaved the coffee until it burst. We understood: the universe favors forgetfulness.

ponty_rick 3 months ago

As a software engineer, I learned what entropy was in computer science when I changed the way that a function was called which caused the system to run out of entropy in production and caused an outage. Heh.

alganet 3 months ago

Nowadays, it seems to be a buzzword to confuse people.

We IT folk should find another word for disorder that increases over time, specially when that disorder has human factors (number of contributors, number of users, etc). It clearly cannot be treated in the same way as in chemistry.

soulofmischief 3 months ago

Maybe you're confused by entropy? It's pretty well established in different domains. There are multiple ways to look at the same phenomenon, because it's ubiquitous and generalized across systems. It comes down to information and uncertainty. The article in question does attempt to explain all of this if you read it.
- alganet 3 months ago
  
  Maybe I am.
  The part of thr article on information theory is more about mathematics than software. I don't deny there could be some generalization there.
  The problem I see is that this could slip to measure human actions, which are also source of uncertainty, but although the words fit, in this particular case I think associating it with classical entropy does more harm than good.
  https://en.m.wikipedia.org/wiki/Software_rot
  Entropy as described in this article (software entropy), to me, does not fall under the same generalization. It is a looser use of the word. I used it myself several times, but now people are buzzwording entropy all around, and I think that looser use should be retracted to avoid thinking of humans as numbers or particles.
  - at_a_remove 3 months ago
    
    You are being fazed by two different, annoying things.
    Even in physics itself, the word "mass" has multiple contexts (inertial, gravitational, and conversion to energy) in which it is used. Einstein made quite a lot of hay out of relating the three. "Entropy," too, has multiple contexts.
    The second thing possibly tripping you up is the tendency for scientific terms to be poorly appropriated into a new context, like "theory." You can fight this but it is a losing battle, so I typically just try to set it aside.
    
    alganet 3 months ago
    
    A borrowed term in another context is not the same as generalization. Read my responses again.
    I think my first comment was clear enough about it without needing to step into semantics.
petsfed 3 months ago

When I use it in an IT (or honestly, any non-physics or non-physics) context, I typically mean "how many different ways can we do it with the same effective outcome?".
To whit, "contract entropy": how many different ways can a contractor technically fulfill the terms of the contract, and thus get paid? If your contract has high entropy, then there's a high probability that you'll pay your contractor to not actually achieve what you wanted.