core: add entity classification of origins to the LHR #14622

alexnj · 2022-12-17T01:18:39Z

Addressing #14440 and the design doc, this PR introduces EntityClassification computed artifact that classifies all 1st and 3rd party entities (recognized by third-party-web and unrecognized). The classification is exposed in LHR as a top-level property LHR.entities. By and large, this achieves a few important things:

Switches third-party filtering logic on the reporting side from origin string comparison to proper entity comparison. Now moved to report: use entity classification to filter third-parties #14697.
Lays down the necessary groundwork for grouping and aggregating reports by entities (report: group third-party entities #14655). This will also support filtering reports by entities, in future. Now moved to report: use entity classification to filter third-parties #14697.
Partially addresses Expand third-party classification to include unrecognized entities #14440 (Everything minus "Closing the loop", which requires additional work to reduce the amount of user-generated reports, through automation).
Implements a computed artifact that audits can use to resolve urls -> entities during audit time (optionally marking audit result items with their entities during audit time). Audits that don't resolve entities during audit, but feature a url in any form (url, or source-location, etc.) will be resolved during prepareReportResult phase.

This is an alternate implementation to #14614, based off discussion on the PR. This will attempt to do the same outcome as the original PR, but without classifying entities during most of the audits.

I've completed the checked items on this branch:

Additional targets based on the meeting discussion:

Make entity-classification available as an LHR level property, and remove it from being an audit.
- Retain ComputedEntityClassification for any audit that would like to classify entities during audit.
Drop item.entity query-ability from LHR JSON.
Move entity resolution to prepareReportResult (Now moved to report: use entity classification to filter third-parties #14697).
Rename lhr.entityClassification to lhr.entities and lhr.entities.list.

Audits that make use of `ComputedEntityClassification` to resolve 1p vs. 3p during audit time:

The following audits previously assumed that all unrecognized third-parties are first-parties.

valid-sourcemaps
third-party-summary
third-party-facades (Marks entities on items during audit)
legacy-javascript

Comparison to 14614: entity-based-3p...entity-based-3p-reportonly

I've tried to document all the audit results that deal with a URL and could use entity classification, but have differences in their output format, and requires special handling, in this document. It might not be exhaustive yet, but we could use it to collect them and discuss how to handle those differences.

core/lib/entity-classification.js

…tonly

types/lhr/lhr.d.ts

Co-authored-by: Connor Clark <[email protected]>

alexnj · 2023-01-23T23:11:25Z

Added to-do item: rename lhr.entityClassification to lhr.entities and the inner array of entity objects to lhr.entities.list

adamraine

Overall this LGTM with a few nits + the rename

core/runner.js

core/test/computed/entity-classification-test.js

connorjclark

LGTM once the property has been renamed.

brendankenny

This is great! A couple of comments but otherwise LGTM

core/runner.js

core/computed/entity-classification.js

core/audits/third-party-facades.js

brendankenny · 2023-01-24T01:13:42Z

core/computed/entity-classification.js

+     * @return {boolean}
+     */
+    function isFirstParty(url) {
+      return entityByUrl.get(url) === firstParty;


entityByUrl not having a URL should be a bug, do you foresee a scenario?

it definitely feels like there should be a fallback or a loud error for this case, e.g. if I made this request, would it have been first/third party. I don't think there's anywhere where we do this, but I think it's very possible someone wouldn't notice if they added it in the future and this just always returns false for that case.

A if (!entityByUrl.get(url)) throw new Error('...') equivalent would probably be fine to keep things simple until it's actually needed.

core/test/audits/byte-efficiency/legacy-javascript-test.js

brendankenny · 2023-01-24T01:17:25Z

core/test/audits/third-party-summary-test.js

@@ -17,7 +17,7 @@ describe('Third party summary', () => {
    const artifacts = {
      devtoolsLogs: {defaultPass: pwaDevtoolsLog},
      traces: {defaultPass: pwaTrace},
-      URL: {finalDisplayedUrl: 'https://pwa-rocks.com'},
+      URL: {finalDisplayedUrl: 'https://pwa.rocks'},


what happened here?

That seemed to be an incorrect entry, per progressive-app-m60.json trace which has no hostname matching pwa-rocks.com.

brendankenny · 2023-01-24T01:21:38Z

types/artifacts.d.ts

+    isUnrecognized?: boolean;
+  }
+
+  interface EntityClassification {


Any way this can be moved into the file itself? Type dumping grounds make me sad, and since this isn't an artifact but a live function return value, seems better to have defined with the source?

This is cross referenced across several audits and is there a cleaner way to share? LH.Artifacts.* is where all computed audits that require sharing is defined. Example: LH.Artifacts.NetworkRequest

core/computed/entity-classification.js

* rename entityClassification to entities in LHR. * update comment * Update proto def with LhrEntity similar to LhrCategory

…tonly

alexnj added 30 commits October 20, 2022 15:14

First cut of entity classification computed artifact.

7833354

some cleanup on the types

0a154eb

Refactor resource-summary audit with third party classification

3212aa6

Refactor unused-javascript to output entity and is-3p flag

c71a8ab

Refactor 3p audits to depend on computed entity classification.

14163bd

Expose entity classification to LH report via a hidden audit.

46d706c

Refine groupBy feature to be more concise.

b5caa4f

Replace domains with homepage

8689903

Revert the LHR grouping changes, after discussing the design with team.

5b67f2e

Add name based lookup to entity classification audit result.

0abc053

Refactor third-party filter to base on entity-classification

170adcb

Attach entity classification to ByteEfficiencyAudit.

5e18f75

Classify bootup-time (Reduce JavaScript execution time) audit

f0674c7

Classify long-tasks (Avoid long main-thread tasks) audit

69725b7

Classify uses-long-cache-ttl audit.

ab5d847

Mark sub-items with entity as well

04bfff8

Fix the regression caused to third-party-summary audit

e8c1c7e

Classify uses-rel-preconnect audit.

0a87d78

Classify all ViolationAudit derived audits.

d953829

Classify no-unload-listeners audit

aea71bf

Classify total-byte-weight audit

cb16f02

Merge remote-tracking branch 'origin/main' into entity-based-3p

1e73c13

Updated components/CSS

e79f44b

Merge remote-tracking branch 'origin/main' into entity-based-3p

6f80513

Some cleanup

d299ae8

Classify valid-source-maps audit

f462dc8

Classify legacy-javascript audit

9189494

Classify all audits that depend on makeOpportunityDetails call

f1d9893

Merge remote-tracking branch 'origin/main' into entity-based-3p

1ad67ac

Explicitly name the lookup tables.

0230896

vercel bot deployed to Preview January 23, 2023 18:20 View deployment

alexnj added a commit that referenced this pull request Jan 23, 2023

Collect all report related changes from #14622

e8760bb

connorjclark reviewed Jan 23, 2023

View reviewed changes

core/lib/entity-classification.js Outdated Show resolved Hide resolved

Move lib/entity-classification to runner.js

1f7dbfa

vercel bot deployed to Preview January 23, 2023 20:50 View deployment

rename import of entity-classification

e0fd279

vercel bot deployed to Preview January 23, 2023 20:55 View deployment

Merge remote-tracking branch 'origin/main' into entity-based-3p-repor…

f2518f1

…tonly

vercel bot deployed to Preview January 23, 2023 20:59 View deployment

alexnj added the 10.0 label Jan 23, 2023

connorjclark reviewed Jan 23, 2023

View reviewed changes

types/lhr/lhr.d.ts Outdated Show resolved Hide resolved

Update types/lhr/lhr.d.ts

6cf290c

Co-authored-by: Connor Clark <[email protected]>

vercel bot deployed to Preview January 23, 2023 22:49 View deployment

adamraine approved these changes Jan 23, 2023

View reviewed changes

core/runner.js Outdated Show resolved Hide resolved

core/test/computed/entity-classification-test.js Show resolved Hide resolved

alexnj mentioned this pull request Jan 24, 2023

rename entityClassification to entities in LHR. #14702

Merged

3 tasks

connorjclark approved these changes Jan 24, 2023

View reviewed changes

brendankenny approved these changes Jan 24, 2023

View reviewed changes

rename entityClassification to entities in LHR. (#14702)

3e45f69

* rename entityClassification to entities in LHR. * update comment * Update proto def with LhrEntity similar to LhrCategory

vercel bot deployed to Preview January 24, 2023 17:33 View deployment

alexnj mentioned this pull request Jan 24, 2023

Cleanup core/lib/third-party-web.js and tests after entity classification #14705

Open

Review changes

5429e4b

vercel bot deployed to Preview January 24, 2023 23:34 View deployment

alexnj added 2 commits January 24, 2023 15:37

missed review change

1588f50

Merge remote-tracking branch 'origin/main' into entity-based-3p-repor…

05a40f6

…tonly

vercel bot deployed to Preview January 24, 2023 23:38 View deployment

Add a testcase for invalid/non-existent url

c65733d

vercel bot deployed to Preview January 24, 2023 23:51 View deployment

alexnj merged commit 97ab394 into main Jan 25, 2023

alexnj deleted the entity-based-3p-reportonly branch January 25, 2023 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core: add entity classification of origins to the LHR #14622

core: add entity classification of origins to the LHR #14622

Uh oh!

alexnj commented Dec 17, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

alexnj commented Jan 23, 2023

Uh oh!

adamraine left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

connorjclark left a comment

Uh oh!

brendankenny left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brendankenny Jan 24, 2023

Uh oh!

Uh oh!

brendankenny Jan 24, 2023

Uh oh!

alexnj Jan 24, 2023

Uh oh!

brendankenny Jan 24, 2023

Uh oh!

alexnj Jan 24, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

core: add entity classification of origins to the LHR #14622

core: add entity classification of origins to the LHR #14622

Uh oh!

Conversation

alexnj commented Dec 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Audits that make use of ComputedEntityClassification to resolve 1p vs. 3p during audit time:

Uh oh!

Uh oh!

Uh oh!

alexnj commented Jan 23, 2023

Uh oh!

adamraine left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

connorjclark left a comment

Choose a reason for hiding this comment

Uh oh!

brendankenny left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brendankenny Jan 24, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brendankenny Jan 24, 2023

Choose a reason for hiding this comment

Uh oh!

alexnj Jan 24, 2023

Choose a reason for hiding this comment

Uh oh!

brendankenny Jan 24, 2023

Choose a reason for hiding this comment

Uh oh!

alexnj Jan 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alexnj commented Dec 17, 2022 •

edited

Loading

Audits that make use of `ComputedEntityClassification` to resolve 1p vs. 3p during audit time:

adamraine left a comment •

edited

Loading

brendankenny left a comment •

edited

Loading

alexnj Jan 24, 2023 •

edited

Loading