Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy

machinelearning.apple.com

50 points by layer8 2 days ago

I worked on a similar system at Google for gboard, the Google branded android keyboard that we called “federated analytics” - it worked with device-to-device communication and invertable bloom lookup tables. I’m still not super sure how the Apple system works after reading it, but I don’t see ant mention of using data structures like that, instead they are polling the devices themselves it seems? Does anyone else have more insight to the mechanics, because that seems super inefficient?

https://research.google/blog/improving-gboard-language-model...

matthewdgreen 19 hours ago

I went looking for exactly this information the other day. I was surprised to find that it's hard to come up with recent, detailed explanations of what Apple is doing for telemetry collection. When they announced their DP systems back in 2017, they were clearly doing something like Google's RAPPOR [1]. But it's been several years since then and their writeups haven't been updated very much at all [2].
This is pretty important, because these systems aren't so robust that you can just assume everything is working without review. (See, for example, this paper [3].) Apple should at least document what kinds of data are being collected, and precisely how the collection process works.
[1] https://static.googleusercontent.com/media/research.google.c... [2] https://www.apple.com/privacy/docs/Differential_Privacy_Over... [3] https://arxiv.org/pdf/1709.02753

jsenn 2 days ago

> This approach works by randomly polling participating devices for whether they’ve seen a particular fragment, and devices respond anonymously with a noisy signal. By noisy, we mean that devices may provide the true signal of whether a fragment was seen or a randomly selected signal for an alternative fragment or no matches at all. By calibrating how often devices send randomly selected responses, we ensure that hundreds of people using the same term are needed before the word can be discoverable. As a result, Apple only sees commonly used prompts, cannot see the signal associated with any particular device, and does not recover any unique prompts. Furthermore, the signal Apple receives from the device is not associated with an IP address or any ID that could be linked to an Apple Account. This prevents Apple from being able to associate the signal to any particular device.

The way I read this, there's no discovery mechanism here, so Apple has to guess a priori which prompts will be popular. How do they know what queries to send?

vineyardmike a day ago

I think the do guess a priori what to query...
Later in the article, for a different (but similar) feature:
> To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics... We then derive a representation, called an embedding, of each synthetic message that captures some of the key dimensions of the message like language, topic, and length. These embeddings are then sent to a small number of user devices that have opted in to Device Analytics.
It's crazy to think Apple is constantly asking my iPhone if I ever write emails similar to emails about tennis lessons (their example). This feels like the least efficient way to understand users in this context. Especially considering they host an email server!
- jsenn a day ago
  
  yeah, the linked paper [1] has more detail--basically they seem to start with a seed set of "class labels" and subcategories (e.g. "restaurant review" + "steak house"). They ask an LLM to generate lots of random texts incorporating those labels. They make a differentially private histogram of embedding similarities from those texts with the private data, then use that histogram to resample the texts, which become the seeds for the next iteration, sort of like a Particle Filter.
  I'm still unclear on how you create that initial set of class labels used to generate the random seed texts, and how sensitive the method is to that initial corpus.
  [1] https://arxiv.org/abs/2403.01749
halJordan 9 hours ago

No i think it's fairly well guaranteed that devices are encrypting and then submitting prompts. Differential encryption allows them to do honest-to-god work without decrypting the data. The "fragments" the polled devices are sent are probably some sub-sequence of the differentially encrypted prompt.
E: i guess I'm wrong, apologies
warkdarrior 2 days ago

You could brute force it by querying about all 500k English words. With 1.3+ billion iPhone users, that means about 2600 users will see any goven word, which may be enough to observe trends.

airstrike 2 days ago

> Improving Genmoji

I find it odd that they keep insisting on this to the point that it's the very first example. I'm willing to bet 90% of users don't use genmoji and the 10% who have used it on occasion mostly do it for the lulz at how bizarre the whole thing is.

It seems to me that they don't really have a vision for Apple Intelligence, or at least not a compelling one.

LPisGood a day ago

In the last weeks I have used it for things like very specific drug jokes and a baseball bat.
bitpush 2 days ago

When other companies are curing skin cancer, discovering new proteins, creating photorealistic images/videos, Apple is .. creating Genmojis. lol
- threeseed 2 days ago
  
  Apple has done far more for global health with the Apple Watch, Fitness and Health.
  And they have a dedicated app for participating in clinical studies: https://www.apple.com/ios/research-app/
  - bitpush a day ago
    
    Than Medtronic? Than Astra Zeneca? Than J&J?
    
    lurking_swe a day ago
    
    sometimes the best medicine is preventative medicine. In other words, leading a healthy active lifestyle. It’s a very western perspective to think all medical stuff should be resolved with drugs or surgery. And i say this as someone born in the west.
    No need to be so dismissive. Anyway i do agree those 3 examples you provided are good ones and they have made a big difference in healthcare.
- cheschire 2 days ago
  
  Apple has always been like this. They are never the first one to cross the first few checkpoints. They watch what the winning-est competitors are all doing and then they try to copy that to win the race overall.
  If they hadn’t saddled themselves with the privacy promises, or if OpenAI were willing to uphold those same promises, then I bet Siri would’ve been wholly replaced by ChatGPT by now.

matt3210 2 days ago

I don't want AI to be part of anything I do unless it's opt-in. When I want to use AI I'll go use AI I don't need or want it integrated into my other tools.

I especially dont want it nativly on my phone or macbook unless it's opt-in. the opt-out stuff is soooo frustrating.

lapcat 2 days ago

The article says "opt-in" many times, but my experience as an Apple user, with many devices, is that Apple automatically opts you into analytics, and you have to opt out.

threeseed a day ago

They ask you every time you setup a new device / upgrade the OS whether you want to share analytics or not.
It is opt-in but you just need to click a single checkbox:
https://user-images.githubusercontent.com/3705482/142927547-...
- lapcat a day ago
  
  That looks to me like Apple opts you in, and you have to opt out.
  - threeseed a day ago
    
    Yes just clarifying that you are re-asked every time you upgrade your OS.
    So it's not like Apple is just quietly opting you in.
LPisGood a day ago

I just got a new MacBook and I felt reasonably inundated with requests to opt in to things.

billyboar a day ago

Why are they obsessed with genmoji ffs

aalimov_ 15 hours ago

Could be that it’s a popular feature among some portion of their users.
specialist a day ago

Maybe normalizing avatars to prep users for their planned future perfect black-emoji-sun-verse?

martin_drapeau 2 days ago

I often write in Frenglish (French and English). Apple auto-complete gets so confused and is utterly useless. ChatGPT can easily switch from one language to another. I wish the auto-complete had ChatGPT's power.

johnea 2 days ago

[flagged]

cadamsdotcom 2 days ago

[flagged]

dkga 2 days ago

That is all very nice but as an Apple user I think they need to step up their game with respect to user experience. I often need to switch between three languages in iPhone and the Mac and the keyboard autocorrection and suggestions have become notably worse, not better. Especially since they introduced the dual keyboard.

klipt a day ago

FYI the dual keyboard isn't mandatory, you can still add and use single language keyboards.
I assume the dual keyboard is aimed at people who code switch regularly between two languages in the same message.

hayst4ck 2 days ago

[flagged]

w10-1 2 days ago

This sounds pretty bland and meaningless, but is it?

tldr: Privacy protections seems personal, but not collective:

- For short genmoji prompts, respond with false positives so large numbers are required

- For longer writing, generate texts and match their embedding signatures with opted-in samples

i.e., personal privacy is preserved, but one could likely still distinguish populations if not industries and use-cases: social media users vs. students vs. marketers, conservatives vs. progressives, etc. These categories themselves have meaning because they carry useful associations: marketers more likely to do x, conservatives y, etc. And that information is very valuable, unless it's widely known.

No one likes being personally targeted: it's weird to get ads for something you just searched for. But it might also be problematic for society to have groups be characterized, particularly to the extent that the facts are non-obvious (e.g., if marketers decide within a minute v. developers taking days). To the extent the information is valuable, it's more so if it private and limited (i.e., preserves the information asymmetry), which means the collectors of that information have an incentive to keep it private.

So even if Apple broadly has the best of intentions, even this data collection creates a moral hazard, a valuable resource that enterprising people can tap. It adds nothing to Apple's bottom line, but could be someone's life's work and salary.

Could it be mitigated by a commitment to publish all their conclusions? (hmm: but the analyses are often borderline insignificant) Not clear.

Bottom line for me: I'm now less worried about losing personal privacy than about technologies for characterizing and manipulating groups of consumers or voters. But it's impossible for Apple to characterize users at scale for their own quality assessment -- and thus to maintain their product excellence -- without doing exactly that.

Oy!