Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What topic taxonomy should be used long term? #3

Open
jkarlin opened this issue Jan 21, 2022 · 13 comments
Open

What topic taxonomy should be used long term? #3

jkarlin opened this issue Jan 21, 2022 · 13 comments

Comments

@jkarlin
Copy link
Collaborator

jkarlin commented Jan 21, 2022

  • Who should create and maintain it?
  • Eventually it would be good if this was produced externally to the browser and became an industry standard.
  • The taxonomy should be publicly available for transparency.
  • If the number of topics increase, we’ll need to balance that with the ability of sites to observe topics (e.g., if there are more topics, there is less of a chance that an ad-tech has seen the chosen topic in the past).
@patmmccann
Copy link

Taxonomy should be detailed on datalabel.org if it is new; it could be a vendor specific taxonomy (eg code 600) or is could be iab audience taxonomy 1.1 (segtax 4) in openrtb

@jdevalk
Copy link

jdevalk commented Jan 25, 2022

I’d suggest using the ODP’s taxonomy: http://odp.org/

@johnsabella
Copy link

Thanks Josh for the very interesting API spec. Keeping the taxonomy within the ad-business standards body, IAB TechLab is the best place for this long term. Even if the taxonomy today is not ideal, getting involved and helping to form it within that setting will have great benefits to the industry overall.
https://iabtechlab.com/standards/audience-taxonomy/

@JamesFinlayson-zz
Copy link

Google already has an NLP-based API for topic taxonomies through content categories. This list of 620 categories is pretty comprehensive and has existed since at least 2018 so is, we can assume, pretty battle-tested at this point. The use of NLP removes the need (or opportunity!) for manual assignment and thus the inevitable gaming that would occur around it. As a result, would it make sense, at least to begin with, to lean on that existing list?

@hu0p
Copy link

hu0p commented Jan 26, 2022

@jdevalk Are you suggesting this as a starting point or are you suggesting the owners of ODP should be maintainers? Their About page is interesting, but it doesn't seem to mention much beyond the origins of the taxonomy and some funding concerns. I don't have any particular objections here. I'm only curious about additional information you might have (particularly about who is behind ODP and their interest in involvement) and any other general supporting thoughts for your suggestion.

@johnsabella Agreed on all points. I spotted this in the proposal and was drawn to it. I particularly like the idea of a relevant third-party standards organization maintaining and improving the taxonomy. However, I'm very curious about the applicability of this list. The bulk of it consists of demographic and purchase intent data, with only 496 interests listed. It sounds like the purchase intent and demographic data couldn't be directly adapted to work within the context of the topics API. @jkarlin, could you speak to this?

@JamesFinlayson, the list you linked appears to have a handful (not enough to completely rule it out by any means) of sensitive topics. Otherwise, it appears to be very similar to the starting list that is currently in this repo. A diff of the two lists reveals a lot of overlap with some rephrasing or careful omission. @jkarlin or someone more informed than me would have to say more, but I wouldn't be surprised if the current starting list is based in part on what you shared.

@jkarlin Apologies if this is mentioned in the proposal (I haven't had a chance to look at it again since yesterday), but would it make sense for the API to include a method to return the full list of up-to-date topics over time?

@bquinn
Copy link

bquinn commented Jan 27, 2022

Hi @jkarlin and all, FYI we at IPTC maintain a news- and media-specific subject taxonomy (controlled vocabulary), the IPTC Media Topics, at https://cv.iptc.org/newscodes/mediatopic/. It is available as a SKOS vocabulary in various forms of RDF (Turtle, RDF/XML and JSON-LD) in 12 languages and language variants. The IPTC CV server guidelines for tips on how to download the vocabularies or individual terms using URL query strings or HTTP content negotiation.

I would suggest that whatever taxonomy is used in the long term, it is represented in SKOS or something like it, where each term in the vocabulary has its own URI. This allows for lookup to obtain the name in multiple human languages, lets you specify the hierarchy in machine-readable form, and allows mapping across vocabularies.

For example, we already map most Media Topics terms to Wikidata concept URIs, which then allows for mappings to other subject vocabularies.

Good luck with your project!

@JamesFinlayson-zz
Copy link

@hu0p - yes, they are very similar; thanks for the useful diff link. My main point was really in favor of using a library, such as but not necessarily the Content Categories API provides, to automatically set the topics rather than allowing users to set these. This is important, I believe (a) to address @jkarlin 's transparency point by allowing anyone to check a site's declared topics, (b) to reduce the potential for topic-bloat over time, and (c) to reduce the ease at which the system can otherwise be gamed.

@wayne-innity
Copy link

In order to be useful to Ads industry, highly appreciate if Topics can adhere to the taxonomy list as published by IAB (https://www.iab.com/guidelines/content-taxonomy/)

@avuim
Copy link

avuim commented Sep 12, 2022

In order to be useful to Ads industry, highly appreciate if Topics can adhere to the taxonomy list as published by IAB (https://www.iab.com/guidelines/content-taxonomy/)

Could not agree more, since aside Chrome's topics API there is other browsers and other methods to signal content taxonomies to the demand side which most rely on IAB techlab content taxonomy. Aligning here to an industry standard makes sense. The four tiers given in content taxonomy (currently 3.0) provide enough flexibility to decide on the depth/amount of topics provided.

@avuim
Copy link

avuim commented Sep 16, 2022

@lbdvt
Copy link
Contributor

lbdvt commented Nov 17, 2022

To increase the utility of Topics, and for Topics to better inform marketers on users' buying habits and intents, we suggest using a granular commerce taxonomy such as Google Product taxonomy

@dmarti
Copy link
Contributor

dmarti commented Nov 17, 2022

@lbdvt Topics API is information that is provided on first visit to a new site. More specific lists tend to have a lot of topics that aren't really material that a user would be likely to volunteer when making a first impression. Some examples include:

5824 - Health & Beauty > Personal Care > Oral Care > Denture Cleaners
6562 - Health & Beauty > Personal Care > Ear Care > Ear Wax Removal Kits
7336 - Health & Beauty > Health Care > Medical Tests > HIV Tests
1695 - Cameras & Optics > Optics > Scopes > Weapon Scopes & Sights
5506 - Apparel & Accessories > Clothing > Outerwear > Chaps

(If I show up with those Topics on a job application site, am I more or less likely to get a call back from the hiring manager?)

@dmarti
Copy link
Contributor

dmarti commented Nov 17, 2022

The underlying problem is that any taxonomy that's specific enough for the needs of legit advertisers and trusted ad-supported publishers is too specific for users to share with untrusted sites, or with sites that are trusted in a non-shopping context (you don't share your bass guitar playing or home biochemistry lab interests with the property management site where you're applying for an apartment) One possible way to allow for more detailed taxonomies to be used would be a policy on authorized callers: #87

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests