Failure on some Android 11 devices when switching from secure decoder to a different decoder #8696

itsjamie · 2021-03-09T21:27:31Z

Reproduction Steps

Set SimpleExoPlayer to play a MediaItem configured to play using Widevine, with M3U8 mimetype.
Email coming with replacements for <snip> values.

Exoplayer version: 2.13.2
Android Version: 11

Devices

Samsung S20 FE 5G (Qualcomm and Exynos chipsets both repro)
Samsung Notes - (tablet)
Samsung Galaxy Note - (phone)
Samsung Galaxy Note Ultra OS

Basic Reproduction Case:

val player = SimpleExoPlayer.Builder(applicationContext).build()
val mediaItem = MediaItem.Builder()
            .setUri(<snip>)
            .setMimeType(MimeTypes.APPLICATION_M3U8)
            .setDrmUuid(C.WIDEVINE_UUID)
            .setDrmLicenseUri(<snip>)
            .setDrmLicenseRequestHeaders(mutableMapOf(
                <snip>
            ))
            .build()

player.addMediaItem(mediaItem)
player.prepare()
player.play()

The content piece is a 45-second snippet of a live playlist setup as VOD.
It is 30 seconds of the content which is Widevine CTR encrypted content with a 15-second ad. On the switch to the ad, there is a discontinuity and switch to unencrypted content.

On the failure case:

I/ACodec: [OMX.Exynos.avc.dec] Now Loaded
D/SurfaceUtils: connecting to surface 0x71e621ca70, reason connectToSurface
E/SurfaceUtils: Failed to connect to surface 0x71e621ca70, err -22
E/MediaCodec: nativeWindowConnect returned an error: Invalid argument (-22)

When it does not fail, there is no failure to connect the surface with the code -22.

This does not occur on all Android 11 devices. And even on the Samsung devices we have seen it occur on, not all of them have a 100% reproduction rate. On a device with a high reproduction rate, we've also discovered that setting a logging breakpoint in SynchronousMediaCodecAdapter on the codec.configure line eliminates the issue.

Additionally, calling setDrmSessionForClearPeriods(true) on the MediaItem does resolve the error. But raises the question of what other issues this will potentially cause.

Content to follow in email, along with complete Logcat output from both successful and failure cases.

The text was updated successfully, but these errors were encountered:

itsjamie · 2021-03-09T21:58:59Z

Email with test content has been sent.

icbaker · 2021-03-10T09:16:36Z

To be honest this doesn't sound like an ExoPlayer issue - it seems more likely related to the device's implementation of secure/insecure video decoders.

Additionally, calling setDrmSessionForClearPeriods(true) on the MediaItem does resolve the error. But raises the question of what other issues this will potentially cause.

This setting was introduced for exactly this use-case - of clear ads in encrypted content leading to decoder switches, which resulted in a black screen being shown for a short time on some devices during the switch between ads & content (this also mostly affects Samsung devices iirc).

The only negative impact of using this setting I can think of is that it will use an additional DRM session on the device (which are a limited device-wide resource), but it's quite unlikely that you will end up constrained by session count.

If that setting resolves the issue for you then from ExoPlayer's perspective that's working-as-intended as a workaround for a device-specific issue. For the underlying issue itself I think you should raise the issue directly with the device OEMs.

itsjamie · 2021-03-10T14:05:30Z

@icbaker while it does present as a device-specific issue, isn't this highlighting a race condition in Exoplayer's access of system resources that happens to work now with the majority of vendors, but due to some misalignment in the code below, is on the flip side of the condition for Samsung?

I posit this because of the fact that adding the logging breakpoint in Android Studio resolves the issue, and the setup of the unsecured decoder session works as intended and doesn't throw an IllegalArgumentException when calling into the native_configure method.

From my own research yesterday, the error code of -22 seems to indicate that there are attempting to be two producers hooked up to the same Surface. Seeming to indicate a race between the release of the original secure session, and then the configure and attachment of the unsecured context.

I'll see if there is potentially a way we can get this in front of Samsung to cover the bases. It does feel like they changed something in the native MediaCodec timing of release and reattachment given the fact it is only impacting their devices.

icbaker · 2021-03-10T14:54:00Z

I understand how fixing it by setting a breakpoint points towards a race condition somewhere. I agree that indicates a timing-related issue, probably caused by the interaction of multiple threads.

I'm not sure how that could be in ExoPlayer though, because our control flow ensures that we release a decoder before acquiring the next one, and we do these operations from the same thread.

It seems plausible there is a race condition, but it's in Samsung's implementation of MediaCodec or similar APIs. I guess that means MediaCodec#release() is allowed to return before it's completely cleaned up and disconnected from a surface?

fpitters · 2021-03-11T13:14:52Z

We are seeing the same (or quite close) on One Plus 8T (11). On our case setDrmSessionForClearTypes doesn't seem to help.

Combining a widevine stream with some of the IMA tags:

      {
        "name": "Test with clear periods",
        "uri": "https://storage.googleapis.com/wvmedia/cenc/h264/tears/tears.mpd",
        "drm_scheme": "widevine",
        "drm_license_uri": "https://proxy.uat.widevine.com/proxy?provider=widevine_test",
        "drm_session_for_clear_content": true,
        "ad_tag_uri": "https://pubads.g.doubleclick.net/gampad/ads?sz=640x480&iu=/124319096/external/ad_rule_samples&ciu_szs=300x250&ad_rule=1&impl=s&gdfp_req=1&env=vp&output=vmap&unviewed_position_start=1&cust_params=deployment%3Ddevsite%26sample_ar%3Dpremidpost&cmsid=496&vid=short_onecue&correlator="
      },
      {
        "name": "Test without clear periods",
        "uri": "https://storage.googleapis.com/wvmedia/cenc/h264/tears/tears.mpd",
        "drm_scheme": "widevine",
        "drm_license_uri": "https://proxy.uat.widevine.com/proxy?provider=widevine_test",
        "ad_tag_uri": "https://pubads.g.doubleclick.net/gampad/ads?sz=640x480&iu=/124319096/external/ad_rule_samples&ciu_szs=300x250&ad_rule=1&impl=s&gdfp_req=1&env=vp&output=vmap&unviewed_position_start=1&cust_params=deployment%3Ddevsite%26sample_ar%3Dpremidpost&cmsid=496&vid=short_onecue&correlator="
      },

We get on both cases the same SurfaceUtils, MediaCodec error.
When using drm_session_for_clear_content it fails right after the preroll, and when not using the flag it failst when starting the first midroll.

I haven't been able to reproduce on any emulator or a Xiaomi Note 5 (Android 9).
I'll share the bug report by email in case it helps.

icbaker · 2021-03-11T17:54:29Z

Thanks for the report @fpitters - especially calling out that drm_session_for_clear_content doesn't resolve the issue.

In the logcat of your bug report it seems the secure decoder is used for the pre-roll (expected) then released at the end of the pre-roll (unexpected) and it's the subsequent reacquire/reconfigure to play the content that throws the error you're seeing:

// Acquire at the start of the pre-roll
D EventLogger: videoDecoderInitialized [eventTime=2.05, mediaPos=0.00, window=0, period=0, adGroup=0, ad=0, OMX.qcom.video.decoder.avc.secure]
// Release at the start of the content (end of the pre-roll)
D EventLogger: videoDecoderReleased [eventTime=12.14, mediaPos=0.00, window=0, period=0, OMX.qcom.video.decoder.avc.secure]
// Then the error is thrown

I tried reproducing this on a Pixel 3a XL and although I don't see any crashes I do see the same pattern of acquires/releases at the transition from pre-roll to content (but not from content to mid-roll or mid-roll to content) (this is with drm_session_for_clear_content: true:

// Pre-roll starts
D/EventLogger: videoDecoderInitialized [eventTime=1.35, mediaPos=0.00, window=0, period=0, adGroup=0, ad=0, c2.qti.avc.decoder.secure]
// Pre-roll ends
D/EventLogger: videoDecoderReleased [eventTime=11.76, mediaPos=0.00, window=0, period=0, c2.qti.avc.decoder.secure]
D/EventLogger: videoDecoderInitialized [eventTime=11.89, mediaPos=0.00, window=0, period=0, c2.qti.avc.decoder.secure]
// Content plays, then mid-roll, then content with no decoder changes (expected)

So I think that's an ExoPlayer bug - we should just keep the same decoder throughout.

As mentioned in the reply above the crashes seem like they might be an artefact of quickly (?) switching the same surface from one decoder to another - and it seems likely that part is device specific. I'll try and get hold of the devices you've both reported to see if I can reproduce the crashes.

icbaker · 2021-03-11T18:19:24Z

Ah I think I've found the cause of the 'spurious' codec re-initialization when switching from pre-roll to content.

The pre-roll resolution and inputSize is smaller than the content. Which means we need to re-initialize the codec, and it's triggered by this logic here (we enter both ifs):

ExoPlayer/library/core/src/main/java/com/google/android/exoplayer2/video/MediaCodecVideoRenderer.java

Lines 631 to 637 in 4364b91

 @DecoderDiscardReasons int discardReasons = evaluation.discardReasons; 

 if (newFormat.width > codecMaxValues.width || newFormat.height > codecMaxValues.height) { 

 discardReasons |= DISCARD_REASON_VIDEO_MAX_RESOLUTION_EXCEEDED; 

 } 

 if (getMaxInputSize(codecInfo, newFormat) > codecMaxValues.inputSize) { 

 discardReasons |= DISCARD_REASON_MAX_INPUT_SIZE_EXCEEDED; 

 }

ExoPlayer doesn't know the content resolution in advance, so it starts by configuring the codec only large enough to play the ad - and then needs to reconfigure. This explains why we don't see any re-initialization at mid-roll, because the codec that was large enough to play the content can also play the ad and then be re-used.

So I think I retract my statement above that this part is an ExoPlayer bug - I'm not sure we can do any better than trigger a decoder re-initialization at this transition.

itsjamie · 2021-03-11T21:59:46Z

Yeah, for us, we no longer saw the codec reinitialization, because our resolutions matched, and the format profile was within the bandwidth max size limitations. Even though there was a profile switch between the clear and secure content.

So, to that point though, this is one of those gotcha's I was wondering about and will pass along to the ad operations folks. Very important for these devices then that we keep the renditions in lockstep :)

Still looking for the best place to report/provide the reproduction to Samsung.

icbaker · 2021-06-23T16:54:19Z

I got hold of a OnePlus AC2003 because it was reported as repro-ing this problem reliably in #9095.

On Android 10 I saw no problems, I switched from clear -> secure -> clear decoders every second for several minutes with no problems.

I updated the device to Android 11 and immediately saw the problem (on the first secure -> clear transition).

icbaker · 2021-06-24T14:24:23Z

This is also tracked by [internal b/191966399].

icbaker · 2021-08-09T12:23:13Z

Adding some more info from some brief investigations, and responding to @cdongieux's request for additional workaround suggestions in #9250 (comment):

I experimented with a couple of hacks in MediaCodecVideoRenderer and found they both resolve this issue. We'd like to understand exactly what's going wrong inside the framework before submitting any workaround like this to the library, but you're free to make the changes to a local copy of the library (or subclass MediaCodecVideoRenderer). In both cases I overrode MediaCodecRenderer#releaseCodec in MediaCodecVideoRenderer:

Based on @itsjamie's observations above that it's a timing issue I tried just adding naive Thread.sleep() calls after MediaCodec#release() and before we try and set the surface onto the new codec. In my overriding implementation of MediaCodecVideoRenderer#releaseCodec() I called super.releaseCodec() and then Thread.sleep(60). For the content I was using I found that 60ms was long enough, but 45ms wasn't. Obviously this is really hacky, and hard-coding sleep times like this isn't really a reliable or production-ready approach. I include it here mainly for completeness.
Instead of sleeping after the release, I tried setting a dummy surface before the release. MediaCodecVideoRenderer already has a dummySurface field we can use (it might be null and need initializing). Calling getCodec().setOutputSurface(dummySurface) before super.releaseCodec() resolved the issue for me.

stevemayhew · 2021-09-10T17:29:42Z

@icbaker We have seen something similar to this issue on both Broadcom and AmLogic streamer platforms. Happy to help if we can. For us the issue happens when moving from trick-play tracks (which, in this case, are not encrypted) to regular widevine encrypted tracks, the screen flashes to black for a frame before resuming playback. The hack we added that "fixes" it is to comment out the reset in onDisabled:

@Override
  protected void onDisabled() {
     inputFormat = null;
     if (sourceDrmSession != null || codecDrmSession != null) {
       // TODO: Do something better with this case.
       //onReset();
       flushOrReleaseCodec();

I think @ojw28 was looking at a way to be less drastic in shutting down the CODEC on track changes, but not sure where that work is.

icbaker · 2021-09-13T11:03:55Z

@stevemayhew Just to clarify: Do you just see a 'black flash' or a complete playback failure?

I agree the 'black flash' issue is similar, but this issue is specifically tracking a playback failure when switching a surface between codecs. The 'black flash' is tracked by #8842.

The hack we added that "fixes" it is to comment out the reset in onDisabled:
@Override
  protected void onDisabled() {
     inputFormat = null;
     if (sourceDrmSession != null || codecDrmSession != null) {
       // TODO: Do something better with this case.
       //onReset();
       flushOrReleaseCodec();
I think @ojw28 was looking at a way to be less drastic in shutting down the CODEC on track changes, but not sure where that work is.

We ended up basically doing the same in 2ab0e1d which is included in 2.14.1. Note that if you want to cherry-pick that change you'll want to take 037f24a too (which is needed for complicated reasons related to ExoPlayer#setForegroundMode).

It's been observed that some devices fail when releasing a secure codec attached to a surface and immediately trying to create a new codec (secure or insecure) attached to the same surface. This change catches all exceptions thrown during codec creation, sleeps for a short time, and then retries the codec creation. This is observed to fix the problem (we believe this is because it allows enough time for some background part of the previous codec release operation to complete). This change should have no effect on the control flow when codec creation succeeds first time. It will introduce a slight delay when creating the preferred codec fails (while we sleep and retry), which will either delay propagating a permanent error or attempting to initialize a fallback decoder. We can't avoid the extra delay to instantiating the fallback decoder because we can't know whether we expect the second attempt to create the preferred decoder to succeed or fail. The benefit to always retrying the preferred decoder creation (fixing playback failures) outweighs the unfortunate additional delay to instantiating fallback decoders. Issue: google/ExoPlayer#8696 #minor-release PiperOrigin-RevId: 414671743

It's been observed that some devices fail when releasing a secure codec attached to a surface and immediately trying to create a new codec (secure or insecure) attached to the same surface. This change catches all exceptions thrown during codec creation, sleeps for a short time, and then retries the codec creation. This is observed to fix the problem (we believe this is because it allows enough time for some background part of the previous codec release operation to complete). This change should have no effect on the control flow when codec creation succeeds first time. It will introduce a slight delay when creating the preferred codec fails (while we sleep and retry), which will either delay propagating a permanent error or attempting to initialize a fallback decoder. We can't avoid the extra delay to instantiating the fallback decoder because we can't know whether we expect the second attempt to create the preferred decoder to succeed or fail. The benefit to always retrying the preferred decoder creation (fixing playback failures) outweighs the unfortunate additional delay to instantiating fallback decoders. Issue: #8696 #minor-release PiperOrigin-RevId: 414671743

icbaker · 2021-12-08T10:03:05Z

The commit above works around this issue by retrying a failed decoder creation after a short delay.

It's been observed that some devices fail when releasing a secure codec attached to a surface and immediately trying to create a new codec (secure or insecure) attached to the same surface. This change catches all exceptions thrown during codec creation, sleeps for a short time, and then retries the codec creation. This is observed to fix the problem (we believe this is because it allows enough time for some background part of the previous codec release operation to complete). This change should have no effect on the control flow when codec creation succeeds first time. It will introduce a slight delay when creating the preferred codec fails (while we sleep and retry), which will either delay propagating a permanent error or attempting to initialize a fallback decoder. We can't avoid the extra delay to instantiating the fallback decoder because we can't know whether we expect the second attempt to create the preferred decoder to succeed or fail. The benefit to always retrying the preferred decoder creation (fixing playback failures) outweighs the unfortunate additional delay to instantiating fallback decoders. Issue: google#8696 PiperOrigin-RevId: 414671743

itsjamie added bug needs triage labels Mar 9, 2021

icbaker self-assigned this Mar 10, 2021

icbaker added question bug: device specific and removed bug needs triage question labels Mar 10, 2021

icbaker mentioned this issue May 18, 2021

Secure decoder is re-initialized if renderer is disabled during a transition (e.g. due to seeking or a discontinuity) #8842

Closed

icbaker mentioned this issue Jun 22, 2021

Decoder init failed error when transitioning from DRM protected content to clear content in a playlist #9095

Closed

icbaker changed the title ~~Failure on Samsung devices: Secure -> Non-secure content switch (HLS)~~ Failure on Samsung and OnePlus devices: Secure -> Non-secure content switch (HLS) Jun 23, 2021

icbaker mentioned this issue Aug 3, 2021

DecoderInitializationException: AVC to HEVC switch issue on some devices #9250

Closed

icbaker changed the title ~~Failure on Samsung and OnePlus devices: Secure -> Non-secure content switch (HLS)~~ Failure on some Android 11 devices when switching from secure decoder to a different decoder Aug 4, 2021

icbaker mentioned this issue Sep 1, 2021

Decoder issue while playing MidRoll Ads with Exoplayer 2.14 in Android11 #9358

Closed

icbaker mentioned this issue Sep 28, 2021

E/MediaCodecVideoRenderer: Video codec error #9476

Closed

icbaker mentioned this issue Oct 28, 2021

ExoPlayer crashes on DRM protected contents with IMA ads #9617

Closed

icbaker closed this as completed Dec 8, 2021

icbaker mentioned this issue Jan 12, 2022

VMAP Midrolls raise a decoder exception on some devices (more details below) when main content is Widevine protected #9855

Closed

4 tasks

icbaker mentioned this issue Feb 2, 2022

Mid-roll ad playback fails with Video codec error #9923

Closed

google locked and limited conversation to collaborators Feb 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure on some Android 11 devices when switching from secure decoder to a different decoder #8696

Failure on some Android 11 devices when switching from secure decoder to a different decoder #8696

itsjamie commented Mar 9, 2021 •

edited

itsjamie commented Mar 9, 2021

icbaker commented Mar 10, 2021

itsjamie commented Mar 10, 2021 •

edited

icbaker commented Mar 10, 2021

fpitters commented Mar 11, 2021

icbaker commented Mar 11, 2021 •

edited

icbaker commented Mar 11, 2021

itsjamie commented Mar 11, 2021

icbaker commented Jun 23, 2021

icbaker commented Jun 24, 2021

icbaker commented Aug 9, 2021

stevemayhew commented Sep 10, 2021

icbaker commented Sep 13, 2021

icbaker commented Dec 8, 2021

Failure on some Android 11 devices when switching from secure decoder to a different decoder #8696

Failure on some Android 11 devices when switching from secure decoder to a different decoder #8696

Comments

itsjamie commented Mar 9, 2021 • edited

Reproduction Steps

Devices

itsjamie commented Mar 9, 2021

icbaker commented Mar 10, 2021

itsjamie commented Mar 10, 2021 • edited

icbaker commented Mar 10, 2021

fpitters commented Mar 11, 2021

icbaker commented Mar 11, 2021 • edited

icbaker commented Mar 11, 2021

itsjamie commented Mar 11, 2021

icbaker commented Jun 23, 2021

icbaker commented Jun 24, 2021

icbaker commented Aug 9, 2021

stevemayhew commented Sep 10, 2021

icbaker commented Sep 13, 2021

icbaker commented Dec 8, 2021

itsjamie commented Mar 9, 2021 •

edited

itsjamie commented Mar 10, 2021 •

edited

icbaker commented Mar 11, 2021 •

edited