Skip to content

core: add entity classification of origins to the LHR #14622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 132 commits into from
Jan 25, 2023

Conversation

alexnj
Copy link
Member

@alexnj alexnj commented Dec 17, 2022

Addressing #14440 and the design doc, this PR introduces EntityClassification computed artifact that classifies all 1st and 3rd party entities (recognized by third-party-web and unrecognized). The classification is exposed in LHR as a top-level property LHR.entities. By and large, this achieves a few important things:

  1. Switches third-party filtering logic on the reporting side from origin string comparison to proper entity comparison. Now moved to report: use entity classification to filter third-parties #14697.
  2. Lays down the necessary groundwork for grouping and aggregating reports by entities (report: group third-party entities #14655). This will also support filtering reports by entities, in future. Now moved to report: use entity classification to filter third-parties #14697.
  3. Partially addresses Expand third-party classification to include unrecognized entities #14440 (Everything minus "Closing the loop", which requires additional work to reduce the amount of user-generated reports, through automation).
  4. Implements a computed artifact that audits can use to resolve urls -> entities during audit time (optionally marking audit result items with their entities during audit time). Audits that don't resolve entities during audit, but feature a url in any form (url, or source-location, etc.) will be resolved during prepareReportResult phase.

This is an alternate implementation to #14614, based off discussion on the PR. This will attempt to do the same outcome as the original PR, but without classifying entities during most of the audits.

I've completed the checked items on this branch:

  • Branch off new_audit: entity-classification #14614, to continue receiving PR-feedback changes I make there to entity-classification parts.
  • Reverse-patched all changes except entity-classification audit itself (and tests), and the the audits that require to find 1st/3rd parties during audit (see below for a list).
  • Brought back the LUTs (Look Up Tables), so the report side could convert URL origins and entity names to URLs without too many iterations over entities. Their use is probably better justified in this branch.
  • Implemented a _computeTableItemEntities function in DetailsRenderer that (Now moved to report: use entity classification to filter third-parties #14697),
    • looks for TableColumnHeading.valueType === 'url' and picks up the key to identify a URL to lookup entity for. This should cover all audits that include a url in any column (checks only the first level items).
    • skip any audit that has already marked item.entity during audit.
    • deal with audits that have differences in output and requires special handling.
      • source-location in item.source level.
        • no-unload-listeners
        • geolocation-on-start
        • no-document-write
        • notification-on-start
        • uses-passive-event-listeners
        • deprecations
      • origin corresponds to url but marked as valueType: text
        • network-rtt
          • Skipped handling this as this is a hidden audit.
        • network-server-latency
          • Skipped handling this as this is a hidden audit.

Additional targets based on the meeting discussion:

  • Make entity-classification available as an LHR level property, and remove it from being an audit.
    • Retain ComputedEntityClassification for any audit that would like to classify entities during audit.
  • Drop item.entity query-ability from LHR JSON.
  • Move entity resolution to prepareReportResult (Now moved to report: use entity classification to filter third-parties #14697).
  • Rename lhr.entityClassification to lhr.entities and lhr.entities.list.

Audits that make use of ComputedEntityClassification to resolve 1p vs. 3p during audit time:

The following audits previously assumed that all unrecognized third-parties are first-parties.

  • valid-sourcemaps
  • third-party-summary
  • third-party-facades (Marks entities on items during audit)
  • legacy-javascript

Comparison to 14614: entity-based-3p...entity-based-3p-reportonly

I've tried to document all the audit results that deal with a URL and could use entity classification, but have differences in their output format, and requires special handling, in this document. It might not be exhaustive yet, but we could use it to collect them and discuss how to handle those differences.

alexnj added 30 commits October 20, 2022 15:14
Co-authored-by: Connor Clark <[email protected]>
@alexnj
Copy link
Member Author

alexnj commented Jan 23, 2023

Added to-do item: rename lhr.entityClassification to lhr.entities and the inner array of entity objects to lhr.entities.list

Copy link
Member

@adamraine adamraine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM with a few nits + the rename

Copy link
Collaborator

@connorjclark connorjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the property has been renamed.

Copy link
Contributor

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! A couple of comments but otherwise LGTM

* @return {boolean}
*/
function isFirstParty(url) {
return entityByUrl.get(url) === firstParty;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entityByUrl not having a URL should be a bug, do you foresee a scenario?

it definitely feels like there should be a fallback or a loud error for this case, e.g. if I made this request, would it have been first/third party. I don't think there's anywhere where we do this, but I think it's very possible someone wouldn't notice if they added it in the future and this just always returns false for that case.

A if (!entityByUrl.get(url)) throw new Error('...') equivalent would probably be fine to keep things simple until it's actually needed.

@@ -17,7 +17,7 @@ describe('Third party summary', () => {
const artifacts = {
devtoolsLogs: {defaultPass: pwaDevtoolsLog},
traces: {defaultPass: pwaTrace},
URL: {finalDisplayedUrl: 'https://pwa-rocks.com'},
URL: {finalDisplayedUrl: 'https://pwa.rocks'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seemed to be an incorrect entry, per progressive-app-m60.json trace which has no hostname matching pwa-rocks.com.

isUnrecognized?: boolean;
}

interface EntityClassification {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way this can be moved into the file itself? Type dumping grounds make me sad, and since this isn't an artifact but a live function return value, seems better to have defined with the source?

Copy link
Member Author

@alexnj alexnj Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cross referenced across several audits and is there a cleaner way to share? LH.Artifacts.* is where all computed audits that require sharing is defined. Example: LH.Artifacts.NetworkRequest

* rename entityClassification to entities in LHR.

* update comment

* Update proto def with LhrEntity similar to LhrCategory
@alexnj alexnj merged commit 97ab394 into main Jan 25, 2023
@alexnj alexnj deleted the entity-based-3p-reportonly branch January 25, 2023 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants