Callers getting topics according to a priority list #42

lbdvt · 2022-02-02T12:07:52Z

A caller may not get the same signal from every topic for selecting an ad, for instance "Auto insurance" may be more useful than "Vegan Cuisine".

Would it be possible for callers to provide a ranked priority list of topics, for example at a .well-known location, and for the API to return topics, if eligible, according to this priority list?

jkarlin · 2022-02-02T18:12:07Z

I like the idea in spirit, but in practice it runs up against a privacy concern, that is if different callers on a site receive different topics, then those callers can talk to each other and quickly learn way more topics per week for a user than intended.

So, then you could imagine say that the first caller for the site for the week gets to set a preference and the others on the page are stuck with the first caller's preference. But that doesn't seem fair either. So the plan is to choose randomly.

lbdvt · 2022-02-03T12:58:23Z

More generally, I'm worried about the signal that can be gained from the topics, and how useful it can be for "advertising based on generic interests".

If, for instance, YouTube, Google, and Facebook call the Topics API on their pages, a very significant portion of users may have "Online Video", "Search Engines", and "Social Network" in their top 5 topics, which I don't see as very helpful for advertising.

What are your thoughts on this?

jkarlin · 2022-02-03T15:33:16Z

I think we ought to explore this issue. As a simple idea, we could weight topics by overall frequency on the web (e.g., find the topics of pages in the HTTP Archive and weight topics inversely by frequency). This would help to overcome the issue you've described.

There are other concerns that I have as well in picking the top topics. For instance, let's say the user frequently visits pages about two different sports, but neither individual sport has enough to make it a top 5 topic for the week. But combined, they would be. Should the parent in the hierarchy, sports, then be chosen?

stguav · 2022-02-04T20:35:31Z

Since #46 was merged, restating some points here. Let me request that we clarify the current proposal on how topics are ranked, or make the uncertainty more explicit.

The main questions there that are not explicitly here:

how will a user's repeated visits (within the same epoch) to the same site be treated?
can the classifier have (non-unit) weight output? This would allow for more nuance.
how will topics' weights (unit or otherwise) be added up?

Regarding the taxonomy hierarchy, one convenient way of handling it:

whenever a child node is present, include all parent nodes (perhaps with lower weight, see suggestion about TF-IDF, similar to your above comment).
this also ensures that if a caller has a more granular topic, then the caller also has access to the broader ones.
the downside is we may end up with users' top 5 topics being very redundant, all having parent-child relationships. (Even without this treatment, this is a concern.) One could consider some crowding/deduping logic (perhaps with noise?). Having a more diverse top 5 should improve the average utility across many calls for many users.

jkarlin · 2022-02-09T14:36:59Z

I agree that keeping hierarchy in mind is likely to trend toward higher-level items, which is a concern. I think the TF-IDF approach has potential. Basically, we'd want to measure the inverse frequency of topics (as opposed to documents) based on user traffic. This does require knowledge of how often users visit various sites and what their topics are. Topics can be derived via the Topic API model. But the traffic data would ilkely need to come from Chrome's data which isn't public. That is, unless someone is aware of a good public dataset? I'll look into what can be done. On the bright side, the resulting list of weights would be small (~350) and each topic would be represented by a large numbers of users. So I think we'd have some pretty solid differential privacy properties with a little bit of noise.

jkarlin · 2022-08-19T16:02:46Z

I’d like to stick to the topic of initial weighting for each topic based on its value before we go into hierarchical concerns, repeated topics, etc. Those seem like optimizations that should come after we have a better idea of what a topic is actually worth.

So far we’ve discussed using inverse frequency of topics as a proxy for value, but I’d like to see if we can get a more direct idea of commercial value first. Perhaps the IAB Tech Lab can help us out here.

Hey IAB Tech Lab! (@angelinaeng, @bjd326) We're pretty sure that there is room for improvement in how the Topics API weighs the user’s top 5 topics. We'd like to utilize a notion of topic value that represents the opinions of a large body of the industry. Do you have (or might you be interested in creating) some sort of indication of value for each of the topics in your content taxonomy that we could then apply to Topics as well? Even something as simple as a 1-to-5 scale of commercial utility could be a useful foundation. Perhaps a discussion we could have in an upcoming IAB meeting.

jkarlin · 2022-11-17T15:12:13Z

Another option to get at a notion of commercial value is to use Chrome data. Chrome can determine (in many cases) when a user navigates due to an ad click. We could look at (in an aggregated, differentially private form) the topics of the ad landing pages, and note their frequency. More common landing page topics would be deemed more commercially valuable. Obviously this is imperfect (it excludes topics that may gravitate toward brand ads that don't necessarily require clicks, Chrome's heuristics miss some ads, infrequent ad categories could have huge value), but I'm confident that it would be closer to real commercial value than simple inverse topic frequency across all pages.

jkarlin mentioned this issue Feb 4, 2022

Topic Aggregation and Ranking #46

Closed

jkarlin mentioned this issue May 23, 2022

Reach issue #37

Closed

dmarti mentioned this issue Aug 19, 2022

Cover dynamic pricing use case #34

Open

jkarlin mentioned this issue Nov 17, 2022

Improving Topics assignment for better utility #117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Callers getting topics according to a priority list #42

Callers getting topics according to a priority list #42

lbdvt commented Feb 2, 2022

jkarlin commented Feb 2, 2022

lbdvt commented Feb 3, 2022

jkarlin commented Feb 3, 2022

stguav commented Feb 4, 2022

jkarlin commented Feb 9, 2022

jkarlin commented Aug 19, 2022

jkarlin commented Nov 17, 2022

Callers getting topics according to a priority list #42

Callers getting topics according to a priority list #42

Comments

lbdvt commented Feb 2, 2022

jkarlin commented Feb 2, 2022

lbdvt commented Feb 3, 2022

jkarlin commented Feb 3, 2022

stguav commented Feb 4, 2022

jkarlin commented Feb 9, 2022

jkarlin commented Aug 19, 2022

jkarlin commented Nov 17, 2022