Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Startup failure/freeze when additional discontinuity tags in alternate audio/text tracks #323

Open
1 task
scottfennell opened this issue Apr 14, 2023 · 12 comments
Assignees
Labels

Comments

@scottfennell
Copy link

Media3 Version

ExoPlayer 2.17.1

Devices that reproduce the issue

All devices

Devices that do not reproduce the issue

No response

Reproducible in the demo app?

No

Reproduction steps

Configure the player to disallow chunkless startup: setAllowChunklessPreparation(false)

Details
Media must contain a track that that has isMasterTimestampSource set to false during the preparation step and contains 1 or more additional #EXT-X-DISCONTINUITY tags before the first sample chunk.

This scenario causes the timestampAdjuster provided for that discontinuity sequence to be provided for only to a single track. That HlsMediaChunk is unable to initialize the timestampAdjuster causing the call : timestampAdjuster.sharedInitializeOrWait(isMasterTimestampSource, startTimeUs); to block loading until the timestampAdjuster initialization can be completed.

Since no other tracks contain the same discontinuity sequence and will not be provided with the same timestampAdjuster as the waiting HlsMediaChunk's - loading will never complete.

Expected result

Playback starts

Actual result

Playback is stuck in loading state

Media

https://storage.googleapis.com/mlbapp-streaming-resources/big_buck_bunny_disc/x36xhzz.m3u8

Bug Report

  • You will email the zip file produced by adb bugreport to [email protected] after filing this issue.
@marcbaechinger
Copy link
Contributor

marcbaechinger commented Apr 18, 2023

I can repro this with 2.17.1 where startup hangs in STATE_BUFFERING but not with v2.18.0 and later. I have tried with the GitHub tag r2.18.0 and on tip of tree in the internal source tree and I can not repro with a recent version.

May I ask you to try this out with a recent release of ExoPlayer? Either with the demo app or your own app?

I did repro in the demo app with 2.17.1 and then checked out tag r2.18.0. You can do the same with the most recent release if that's easier than upgrading your app:

  1. Check out the repo
git clone https://github.com/androidx/media.git
  1. Change PlayerActivity
  • On L276 set the media source factory to use an HlsMediaSourfce.Factory with chunkless preparation disabled.
ExoPlayer.Builder playerBuilder =
      new ExoPlayer.Builder(/* context= */ this)
          .setMediaSourceFactory(
              new HlsMediaSource.Factory(dataSourceFactory)
                  .setAllowChunklessPreparation(false));
  • on L286 comment out the call to configurePlayerWithServerSideAdsLoader() to avoid a null pointer.

  • on L391 comment out the call to releaseServerSideAdsLoader() to avoid a null pointer.

  1. Add your media to media.exolist.json:
{
    "name": "#323",
    "uri": "https://storage.googleapis.com/mlbapp-streaming-resources/big_buck_bunny_disc/x36xhzz.m3u8"
},

I have applied these changes above for 2.17.1, 2.18.0 and tip of tree and I can repro on 2.17.1 only.

@scottfennell
Copy link
Author

I was able to test 2.18.5 and that did work when the additional #EXT-X-DISCONTINUITY exists within an alternative audio stream. However I am still seeing a similar problem when the discontinuity is in a text track. The text track issue can still occur on startup if there are mismatching discontinuity tags but it can also occur during playback.

I created an example here: https://storage.googleapis.com/mlbapp-streaming-resources/big_buck_bunny_disc/bunny_text.m3u8

This has a couple discontinuity tags about 2 min in to the text track and will start playing fine - but will not allow playback at any point past the text's discontinuity tags. I tested this on 2.18.5 (without the chunkless preparation change)

@marcbaechinger
Copy link
Contributor

Thanks for the stream. I can repro this with the current version as you describe.

@marcbaechinger
Copy link
Contributor

marcbaechinger commented May 3, 2023

The stream that you shared above, includes subtitle renditions that have discontinuities that are not present in the video or audio renditions. When encountering such a discontinuity, the player/timestamp adjuster waits until the adjustment in the main track to be adjusted. If this does not happen, the player falls into the buffering state that you observe and is stalling playback.

I can repro this behavior. That's not much news for you, it's what you mentioned above.

This behavior is in place in ExoPlayer since a good while and worked in the sense that we never got complaints that I am aware of. I was wondering why because it obviously stalls playback and we would have people tell us.

I think the reason for not hearing such complaints is that the HLS Authoring Guide from Apple who owns the spec says:

8.15. All variants and renditions MUST have discontinuities at the same points in time.

https://developer.apple.com/documentation/http_live_streaming/http_live_streaming_hls_authoring_specification_for_apple_devices

I understand that you created such a stream for testing purposes by adding the discontinuities in the text rendition. I haven't tested on Apple devices or with other players but it seems to me that the implementation in ExoPlayer is following the authoring specification. Given this I was wondering what the reason for having such streams are. Personally I wouldn't go beyond the authoring guidelines from Apple when producing content because it may not work with some players and player makers may point to that spec regarding this.

Can you give me some context on why you are attempting to create such content?

@scottfennell
Copy link
Author

This is from our media technology team:

For context on stream authoring this is just a test that is meant to mimic a live scenario. We had an issue with live streams, which had rolling window playlist and network congestion in our data center that caused us to produce streams with mismatched discontinuity sequences across all variants. The only control we have in this scenario is configuring the encoders output buffer, which may very well overflow and lead to this mismatch if network outage is too long.

While this mismatched discontinuity sequence is an edge case for us we found that for this use case apple and web devices were able to maintain consistent playback and switch variants to find the proper segment with the proper sequence number.

@marcbaechinger
Copy link
Contributor

Thanks for the context.

In general I think its a good strategy to stick to the authoring guidelines for us as a player. This allows players to make changes in the future and not to be locked into something specific to partners content or suddenly not supporting it anymore with an update in the future.

However, I also think that if this is a function of the exceptional network conditions described above, playback shouldn't silently stall without the player or app being aware of it. If the player would throw an exception in this case, the app can decide to restart playback from the live edge and continue playback that way. Hence what we can do is to throw an Exception when playback stalls for too long. The resulting playback exception for live stream can then be caught and the live stream can be restarted.

@scottfennell
Copy link
Author

I agree with that solution - an error would be much more helpful to at least get an understanding of what is wrong allow that to report the error so we can react to the failure.

@marcbaechinger
Copy link
Contributor

Thanks for the feedback. We look into this and update the issue here when a fix lands. It's planned to include this in the next release. Thanks again for reporting!

tof-tof pushed a commit to google/ExoPlayer that referenced this issue Jun 11, 2023
Add `HlsMediaSource.Factory.setTimestampAdjusterInitializationTimeoutMs(long)` to set the timeout for the loading thread to wait for the `TimestampAdjuster` to initialize. If the initialization doesn't complete before the timeout, a `PlaybackException` is thrown to avoid the playback endless stalling. The timeout is set to zero by default.

This can avoid HLS playback endlessly stalls when manifest has missing discontinuities. According to the HLS spec, all variants and renditions have discontinuities at the same points in time. If not, the one with discontinuities will have a new `TimestampAdjuster` not shared by the others. When the loading thread of that variant is waiting for the other threads to initialize the timestamp and hits the timeout, the playback will stall.

Issue: androidx/media#323

#minor-release

PiperOrigin-RevId: 539108886
tof-tof pushed a commit that referenced this issue Jun 11, 2023
Add `HlsMediaSource.Factory.setTimestampAdjusterInitializationTimeoutMs(long)` to set the timeout for the loading thread to wait for the `TimestampAdjuster` to initialize. If the initialization doesn't complete before the timeout, a `PlaybackException` is thrown to avoid the playback endless stalling. The timeout is set to zero by default.

This can avoid HLS playback endlessly stalls when manifest has missing discontinuities. According to the HLS spec, all variants and renditions have discontinuities at the same points in time. If not, the one with discontinuities will have a new `TimestampAdjuster` not shared by the others. When the loading thread of that variant is waiting for the other threads to initialize the timestamp and hits the timeout, the playback will stall.

Issue: #323

#minor-release

PiperOrigin-RevId: 539108886
tof-tof pushed a commit that referenced this issue Jun 12, 2023
Add `HlsMediaSource.Factory.setTimestampAdjusterInitializationTimeoutMs(long)` to set the timeout for the loading thread to wait for the `TimestampAdjuster` to initialize. If the initialization doesn't complete before the timeout, a `PlaybackException` is thrown to avoid the playback endless stalling. The timeout is set to zero by default.

This can avoid HLS playback endlessly stalls when manifest has missing discontinuities. According to the HLS spec, all variants and renditions have discontinuities at the same points in time. If not, the one with discontinuities will have a new `TimestampAdjuster` not shared by the others. When the loading thread of that variant is waiting for the other threads to initialize the timestamp and hits the timeout, the playback will stall.

Issue: #323

#minor-release

PiperOrigin-RevId: 539108886
(cherry picked from commit db3e662)
tof-tof pushed a commit to google/ExoPlayer that referenced this issue Jun 14, 2023
Add `HlsMediaSource.Factory.setTimestampAdjusterInitializationTimeoutMs(long)` to set the timeout for the loading thread to wait for the `TimestampAdjuster` to initialize. If the initialization doesn't complete before the timeout, a `PlaybackException` is thrown to avoid the playback endless stalling. The timeout is set to zero by default.

This can avoid HLS playback endlessly stalls when manifest has missing discontinuities. According to the HLS spec, all variants and renditions have discontinuities at the same points in time. If not, the one with discontinuities will have a new `TimestampAdjuster` not shared by the others. When the loading thread of that variant is waiting for the other threads to initialize the timestamp and hits the timeout, the playback will stall.

Issue: androidx/media#323

#minor-release

PiperOrigin-RevId: 539108886
(cherry picked from commit 4eb56cf)
@stevemayhew
Copy link
Contributor

stevemayhew commented Mar 30, 2024

@marcbaechinger this is related to an issue that has been on my plate for years now :-( google/ExoPlayer#8952

There are some live packagers in the wild that independently package segments from transcoded streams that can dropout (UDP streams, or just raw un-conditioned TS streams). They have a variety of methods that fall under the HLS "standard" but do not play to the guidelines (similar in the DASH world to meeting the ISO standard but falling short of the DASH-IF Guidelines). These produce fun things like:

  1. Jumps in the playlist timeline marked only by a PROGRAM-DATE-TIME jump (Pseudo GAP)
  2. EXT-X-DISCONTINUITY randomly in one or more playlists
  3. Large jumps in the media time (PTS or CTS) un-marked by any EXT-X-DISCONTINUITY

Oliver and I went back and forth a lot on this, and I have come to his camp, these are media problems the player should not try to fix.

That said, freezing playback (either stuck in ready or stuck in buffering) is not good either. We'll be pulling in the solution that @tof-tof put in for our version when this happens (retrying "fixes" the freeze, and allows logging the issue).

Might be time to visit un-commenting this code as well:

// TODO: Uncomment this to reject samples with unexpected timestamps. See

Perhaps with a settable threshold that defaults to infinite (effectively then the same behavior as the commented out code has). The issue is marked closed: google/ExoPlayer#7030 But I'm sure if I dig up the pro-bowl video from the issue it will still hang AndroidX (FWIW AvPlayer mostly just looses AV sync on these, an even worse behavior but at least one that is not blamed on the player)

Love your thoughts on this?

@tonihei
Copy link
Collaborator

tonihei commented Apr 2, 2024

@stevemayhew

these are media problems the player should not try to fix

Still agreeing to this part. It really comes down to: how can we detect the problem and what the is the right reaction to it. Am I right in assuming that the code stayed commented because it was unclear how to automatically handle the error? (see google/ExoPlayer#7030 (comment)). If not, please remind me why we left it commented out as I can't seem to find the right comment trail for it.

Perhaps with a settable threshold that defaults to infinite

Do you know a good reason to make the actual threshold configurable vs a simple on-off switch? If we leave it as a threshold configuration, it's unclear to other apps what value makes sense for the purpose. A simple boolean throwAtUnexpectedDiscontinuities (or similar) might be more readable and understandable. The default can be off initially, or we can consider turning it on by default if it feels safe enough.

@stevemayhew
Copy link
Contributor

@stevemayhew

these are media problems the player should not try to fix

Still agreeing to this part. It really comes down to: how can we detect the problem and what the is the right reaction to it. Am I right in assuming that the code stayed commented because it was unclear how to automatically handle the error?

@tonihei IMO report it as an error (a specific PlaybackException and allow the client to recover (enough info to seek past the failed position if any then issue a prepare() ). This allows logging it operationally and reporting media/origin issues to the source origin vendor.

Oliver was pushing for silent automatic recovery, so that is why it stalled as commented out code.

Automatic recovery is tricky, especially if the issue occurs mid-segment. I experimented with it a bit and determined it would take recovery at the timestamp adjustment level, hence the relationship to this issue: google/ExoPlayer#8952

I have an asset still I can share (the Pro Bowl recording) that "plays" in Safari (but it looses A/V sync) but hangs in ExoPlayer

Perhaps with a settable threshold that defaults to infinite

Do you know a good reason to make the actual threshold configurable vs a simple on-off switch?

I think an on-off is fine, the simple threshold is enough time to exceed a buffering span (50 seconds for VOD, much less for live) as one of the worst manifestations of the issue is the player will stall in buffering mode forever (if timestamp mismatch between audio and video exceeds the buffer window). This is because of the way the value for SequenceableLoader.getBufferedPositionUs() is determined assumes clean adjusted timestamps.

Happy to work with a member of my team to clean this up to some release form if we can agree on a design, I think we should:

  1. Promote the exception (UnexpectedSampleTimestampException) to a documented common level (It can happen in DASH too, if the origin vendor does not follow DASH-IF Restricted Timing Model and allows gaps in their media timeline
  2. Use the boolean as you suggested, we can choose a pretty large tolerance for the timestamp, as the stall occurs if it exceeds the buffer threshold

@tonihei
Copy link
Collaborator

tonihei commented Apr 4, 2024

Automatic recovery is tricky, especially if the issue occurs mid-segment

Given how much we need to reset in this case, it would probably involve setting the ExoPlaybackException.isRecoverable flag and let the core player retry automatically. We already do this with some renderer errors, but not with source errors that are meant to be retried lower down the stack. This involves adding the right core logic for it, but this can be done later after the error already exists.

Promote the exception (UnexpectedSampleTimestampException) to a documented common level (It can happen in DASH too, if the origin vendor does not follow DASH-IF Restricted Timing Model and allows gaps in their media timeline

Sounds good to me. Having a dedicated error type allows to identify this problem explicitly for manual retry if needed.

Use the boolean as you suggested, we can choose a pretty large tolerance for the timestamp, as the stall occurs if it exceeds the buffer threshold

That's interesting in that a large tolerance only helps to detect 'stuck' cases and a lower tolerance would be closer to a correctness check that kicks in even if playback could recover itself. Am I right in saying that the current commented-out thresholds are actually better because they detect malformed media directly? As long as it's an opt-in for now, we can use the lower thresholds. If we get around to add an automatic recovery attempt, we can think about making the setting opt-out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants