Closed Bug 1449739 Opened 6 years ago Closed 6 years ago

Investigate possible bug in Profile Creation Date calculation

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: frank, Assigned: janerik)

References

Details

Attachments

(1 obsolete file)

We're seeing profile creation dates that differ significantly from min(submission_date_s3) for clients. Essentially, 10% of clients are showing more than a year of difference between the two; 5% are showing 3.5 years.

This seems more extreme than clock skew alone. Note this is all done on post-pingsender data, so latency should be low as well.

Detailed analysis by Ben Miroglio here: https://gist.github.com/benmiroglio/91ade009ddc81aee687870f3d67af24f

Take note especially of the last two plots, which show the CDF of (min(submission_date_s3) - PCD).
Running the numbers in STMO: https://sql.telemetry.mozilla.org/queries/52284/source#139394

For new profile pings since 2018-02-01:
- 10% of clients reported a profile_creation_date 150 days or more before the submission_date
- 5% of clients reported a profile_creation_date 1126 days or more before the submission_date

Notice that it is heavily biased for before submission_date.
Addendum (pings not clients):

For new profile pings since 2018-02-01:
- 10% of pings reported a profile_creation_date 150 days or more before the submission_date
- 5% of pings reported a profile_creation_date 1126 days or more before the submission_date
We use ProfileAge.jsm (which, aside from broad refactors, has been unchanged for years) to calculate the age of the profile. It uses the creation date (or modification date if creation date is not present) of files present in the profile dir.
Priority: -- → P2
I wonder how much this might interact with clock skew...

Investigation required. Clocks are hard.
(In reply to Chris H-C :chutten from comment #4)
> I wonder how much this might interact with clock skew...
> 
> Investigation required. Clocks are hard.

It's not Clock Skew. 99.29% of clients report a clock skew of less than a day [0], and 99.6% within 2 days [1].

That makes sense, because Clock skew would (presumably) be evenly distributed over the interval [2], but the PCD/submission_date offset is heavily skewed.

[0] https://sql.telemetry.mozilla.org/queries/52348/source#139551
[1] https://sql.telemetry.mozilla.org/queries/52348/source#139535
[2] Citation Needed, though the above plots do seem to support that
 Ugh, Fix: "99.29% of new-profile pings" _NOT_ clients.
Assignee: nobody → jrediger
Bit of follow-up now that this is assigned: this does seem to be affecting every OS, but not equally. For example, Darwin and Linux both report 85% with 0 or 1 days of difference, but Windows is 36 days to get to 85%. The key here is that the initial break-point is different for each OS, with a different height, but the long tail looks relatively similar.

https://sql.telemetry.mozilla.org/queries/52416/source#139660
I have looked into that trying to find an explanation or any bugs in the code.
So far I did not come up with a root cause for this.
I will list the things I found out:

* ProfileAge.jsm. The code tries to read a timestamp from `times.json` or, if not present, iterate through all files in the profile directory to find the oldest (by creation date/modification date). This date is stored in `times.json`.
* I looked for the first occurence of this problem and it looks like it's always been present since the introduction of the `new-profile` ping a year ago
* My first hunch was that the `new-profile` ping is sent when users upgrade their browser to a version that enables that ping. However, that case explicitely does not trigger a `new-profile` ping[1]. I verified that we have a test for that[2] and that the test works as expected (though it does block, not fail when the wrong behaviour is encountered).
* :chutten brought up the issue that ProfileAge might start from the wrong directory (i.e. the parent of the current profile). I checked that in the code and ProfileAge uses the current profile's directory as expected [3] (TelemetryEnvironment invokes that with `profile = null`). I also verified that locally, logging which files are taken into account.

[1]: https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetrySession.jsm#1967-1969
[2]: https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/tests/unit/test_TelemetryController.js#575-576
[3]: https://searchfox.org/mozilla-central/source/toolkit/modules/ProfileAge.jsm#19

As listed above, I did not find the root cause. The code looks correct in those places.
The only current explanation would be that somehow very old files end up in the profile directory (or the filesystem reports old stuff).

However, there would be a simple fix going on from here:
Instead of scanning all profile files for the oldest one and then storing that state we can fill the `times.json` when a new profile is created directly.
See Also: → 1453982
See Also: → 1456264
Priority: P2 → P1
We now have some data [1] showing that we indeed encounter really old files during a profile directory scan.
I finally found some time to trace how Firefox create profiles. We already have code writing a times.json file with the current timestamp[2]. This code is not triggered if the profile directory already exists (or `-profile path/to/profile` was used on the command line).

We can avoid scanning if we just store a timestamp early. Choosing the current time during startup of Firefox seems to be as good as any other date (and that only happens if there's not already a times.json).
I attached a patch implementing this (We probably need review from some module owner of xre and profile if we want to go forward with this).



[1]: https://mzl.la/2H8eibn
[2]: https://searchfox.org/mozilla-central/rev/38bcf897f1fa19c1eba441a611cf309482e0d6e5/toolkit/profile/nsToolkitProfileService.cpp#803-807
Comment on attachment 8982519 [details]
Bug 1449739 - Write times.json (if missing) after selecting a profile.

https://reviewboard.mozilla.org/r/248488/#review254802

f+

On every start of Firefox, write (if it does not exist) a times.json file to the current profile's directory using the current time as the content? Seems legit. 

Weird clients can still break this: maybe a client doesn't have write access to their own profile directory and so times.json will continue to not exist? Maybe the profile dir is overwritten frequently and the times.json will march forward in time? 

But weird clients will always be able to break things. How do the number of profileage scans compare to the number of new profiles?

Will you remove non-times.json logic from ProfileAge in this bug as well? (and its associated data collection)
Attachment #8982519 - Flags: review?(chutten)
Depends on: 1466518
Attachment #8982519 - Attachment is obsolete: true
Attachment #8982519 - Flags: review?(gfritzsche)
After a discussion with :gfritzsche I move over the implementation to bug 1466518, looping in the pipeline team once I have this written up more clearly.
Resolution: INCOMPLETE → FIXED
The underlying issue is not fully fixed yet, but the investigation of it is.
We now have another probe as well as documentation coming.
As a follow-up we should make a decision on how to deal with the out-of-date data eventually.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: