Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerd mtu configuration option does not work on Windows Docker #35683

Open
georgyturevich opened this issue Dec 3, 2017 · 25 comments
Open

Comments

@georgyturevich
Copy link

Description

It is not possible to change default MTU values which will be used inside new containers.

Steps to reproduce the issue:

  1. Change mtu value in your daemon.json as described here: https://docs.docker.com/engine/reference/commandline/dockerd/ for example 1398
  2. Restart the host
  3. Start new container
  4. Execute the command inside container docker exec mtutest powershell netsh interface ipv4 show interfaces

Describe the results you received:

I still see that MTU for "vEthernet ..." is still 1500

Describe the results you expected:

I expected it will be 1398.

I did not find in the codebase that Docker uses this configuration option for Windows so most likely it is a documentation error? But I am not 100% sure, maybe I missed something.

Does anybody now how we else can change the default MTU value which is set to every internal container's vEthernet interface?

Thanks!

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:      17.10.0-ce
 API version:  1.33
 Go version:   go1.8.3
 Git commit:   f4ffd25
 Built:        Tue Oct 17 19:00:02 2017
 OS/Arch:      windows/amd64

Server:
 Version:      17.10.0-ce
 API version:  1.33 (minimum version 1.24)
 Go version:   go1.8.3
 Git commit:   f4ffd25
 Built:        Tue Oct 17 19:09:12 2017
 OS/Arch:      windows/amd64
 Experimental: false

Output of docker info:

Containers: 14
 Running: 13
 Paused: 0
 Stopped: 1
Images: 94
Server Version: 17.10.0-ce
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: ics l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd json-file logentries splunk syslog
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 14393 (14393.1770.amd64fre.rs1_release.170917-1700)
Operating System: Windows Server 2016 Datacenter
OSType: windows
Architecture: x86_64
CPUs: 64
Total Memory: 488GiB
Name: ...
ID: ...
Docker Root Dir: E:\docker_storage_1_13
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: -1
 Goroutines: 109
 System Time: 2017-12-03T21:44:18.6526331+01:00
 EventsListeners: 1
Username: aureadev
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
@croemmich
Copy link

croemmich commented Jan 23, 2018

I'm seeing the same thing on Windows Server Core 1709 with Docker 17.06.2-ee-6. Oddly, once I set the MTU in one container on the interface, future containers seem to also get the new MTU as well.

@croemmich
Copy link

croemmich commented Feb 2, 2018

As a work around, I've created two helper functions that change the MTUs dynamically on the host:
https://bitbucket.org/myriadmobile/powershell/src/HEAD/helpers/system.ps1?at=master&fileviewer=file-view-default

Monitor-MTU creates a Scheduled Task, that runs every minute, that monitors interfaces beginning with a given string and updates the MTU on them.
Monitor-Hyper-V-Switch-MTUs creates a Schedule Task that triggers on the Event Log for changes to the Hyper-V Switch, caused when containers and builds are spun up, and sets the MTU on the vNIC before anything happens in the container.

Simply call:

(new-object Net.WebClient).DownloadString("https://bitbucket.org/myriadmobile/powershell/raw/HEAD/helpers/system.ps1") | iex
Monitor-MTU -InterfaceStartsWith 'vEthernet' -MTU 1410
Monitor-Hyper-V-Switch-MTUs 1410

I'd recommend forking the scripts if you plan on using them in production as they may change.

@pjh
Copy link

pjh commented Jul 26, 2018

I can confirm that this is still not working on Windows Server version 1803: setting the mtu in the docker config file does not affect the MTU of the "vEthernet (Ethernet)" interface in the container.

PS C:\Windows\system32> docker run microsoft/windowsservercore:1803 powershell.exe "netsh interface ipv4 show interfaces"

Idx     Met         MTU          State                Name
---  ----------  ----------  ------------  ---------------------------
 16          75  4294967295  connected     Loopback Pseudo-Interface 2
 17        5000        1500  connected     vEthernet (Ethernet)

PS C:\Windows\system32> Stop-Service docker
PS C:\Windows\system32> $docker_config = "$env:programdata\docker\config\daemon.json"
PS C:\Windows\system32> if (Test-Path $docker_config) { `
>>   echo "$docker_config already exists, manually insert `"mtu`" : `"1460`"" `
>> } else { `
>>   New-Item $docker_config; `
>>   Set-Content $docker_config "{`n  `"mtu`" : 1460`n}" `
>> }

    Directory: C:\ProgramData\docker\config

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----        7/26/2018  10:38 PM              0 daemon.json

PS C:\Windows\system32> Get-Content $docker_config
{
  "mtu" : 1460
}
PS C:\Windows\system32> Start-Service docker
PS C:\Windows\system32> docker run microsoft/windowsservercore:1803 powershell.exe "netsh
interface ipv4 show interfaces"

Idx     Met         MTU          State                Name
---  ----------  ----------  ------------  ---------------------------
 16          75  4294967295  connected     Loopback Pseudo-Interface 2
 17        5000        1500  connected     vEthernet (Ethernet)

Restarting the server does not help either.

Microsoft's own documentation page includes "mtu" in its list of configuration options that are applicable to Docker on Windows. Is this a Docker bug or a Windows bug?

@pjh
Copy link

pjh commented Jul 26, 2018

Hi @JMesser81 @PatrickLang @kallie-b @dineshgovindasamy, this MTU issue is a huge pain point for users of Windows containers on Google Compute Engine which only supports an MTU of 1460. I noted similar disruptive MTU behavior over a year ago.

If someone could take a look at this and let us know if this is a Windows problem or a Docker problem that would be a good start. Thanks!

@pjh
Copy link

pjh commented Jul 26, 2018

I added a uservoice idea for this problem: https://windowsserver.uservoice.com/forums/304624-containers/suggestions/34942090-allow-configuring-a-default-mtu-for-all-containers. Anyone reading this, please add your vote!

@croemmich
Copy link

croemmich commented Jul 26, 2018

@pjh I believe it's a Docker issue, mostly... I glanced through the daemon code when we ran in to the issue a while ago and noticed explicit MTU handling in the Linux code path, but not in the Windows path. I'm not that familiar with Windows, but I believe most of the Docker networking functionality wraps Windows HNS, so I wouldn't be surprised if changes need to be made there as well.

P.S. we're running on Google Cloud, hense the issues. The MTU adjustment script I posted above, or something similar, should probably get shipped in the Windows for Containers images. We've been runing the script for months in our base compute images and it's been rock solid. It's allowed us to build and run stock images with out ever having to worry about the container MTUs.

@pjh
Copy link

pjh commented Jul 26, 2018

Thanks @croemmich. Could you point out the relevant code for this? I've never looked at the Docker code...

@croemmich
Copy link

@pjh

netlabel.DriverMTU: strconv.Itoa(config.Mtu),

@PatrickLang
Copy link

I talked with @daschott and @dineshgovindasamy on this today. There isn't an API to set the MTU at the container vNIC creation time. You should be able to get the Windows Interface index (I think here https://github.com/Microsoft/hcsshim/blob/4a468a6f7ae547974bc32911395c51fb1862b7df/internal/hns/hnsendpoint.go#L12 - correct me if I'm wrong David/Dinesh), then use netsh interface ipv4 set subinterface or Set-NetAdapterAdvancedProperty

@coryan
Copy link

coryan commented Aug 1, 2018

@PatrickLang I think it is Set-NetIPInterface not Set-NetAdapterAdvancedProperty, as in:

Set-NetIPInterface -InterfaceIndex (Get-NetIPInterface -NlMtuBytes 1500).IfIndex -NlMtuBytes 1460

Note that this is super inconvenient when doing a docker build. The MTU seems to be reset after each RUN command in the Dockerfile.

@pjh
Copy link

pjh commented Aug 7, 2018

Thanks @PatrickLang. Are you saying that it's possible to set the MTU of the container endpoint interface (not sure if I'm describing that accurately) on the host once, and then the vNIC in every container that connects to that host endpoint will use the same MTU?

If that works then it would be helpful (I'll try it out shortly), but a global MTU configuration option would be much easier and more "persistent", i.e. it wouldn't have to be configured again if a new host endpoint is created.

If there isn't currently an API to set the MTU at the container vNIC creation time then it means that https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-docker/configure-docker-daemon is wrong in listing "mtu" as an option that the Windows docker configuration supports. Combined with the fact that packet fragmentation doesn't seem to work correctly with container vNICs (docker/for-win #1144), this seems to be a significant blocker for a lot of Windows container users.

@daschott @dineshgovindasamy

@PatrickLang
Copy link

PatrickLang commented Aug 7, 2018

Here's how OVN-Kubernetes worked around MTU: ovn-org/ovn-kubernetes#361 and DNS search suffix ovn-org/ovn-kubernetes#346

Both of those are Moby options that are ignored on Windows.

@pjh
Copy link

pjh commented Aug 8, 2018

@croemmich do you have a current link for the monitoring agents you noted a few months ago? The bitbucket link from Feb. 2 is returning 404 now. I'm curious to see how you're monitoring the event log for vswitch events :)

@pjh
Copy link

pjh commented Aug 8, 2018

I've confirmed that Set-NetIPInterface can be invoked on the host to update the MTU of a running container, e.g.:

Set-NetIPInterface -IncludeAllCompartments -InterfaceAlias "vEthernet (Ethernet) 2" -NlMtuBytes 1234

It seems plausible that the moby code that creates Windows containers could be changed to: 1) get the new container's vEthernet interface name/ID, then 2) immediately invoke the Set-NetIPInterface command/API to change the container interface's MTU, before returning or letting anything else happen in the container.

Would this be a good idea? If anyone has pointers to where in the moby code these changes could be made, that would be helpful...

@daschott
Copy link

If you are using VFP you should be able to set MTU on interface basis by:
netsh int ipv4 set interface <interface_id> mtu=<mtu>

@jefflill
Copy link

jefflill commented Dec 5, 2018

It looks like Desktop Docker 18.09.0-ce now honors the daemon.json mtu setting, at least for Linux containers. I set this to 1400 and then used tcpdump to watch some traffic and I was seeing mss 1360 in the output.

I haven't tried this on Windows Server or Windows containers, so perhaps that's still broken.

@davidmohar
Copy link

davidmohar commented Dec 13, 2018

@jefflill I tested with Docker 18.09.0-ce and Windows Containers and it seems that it's not honoring the setting in daemon.json. I've changed the daemon file, removed all containers, restarted my computer and re-created containers. Default MTU is still 1500.

@jefflill
Copy link

@davidmohar: I'm not sure if this is actually working correctly myself. I changed the MTU to 1400 in daemon.json and then ran ip addr in an Alpine container on one of the cluster nodes:

root@worker-0:~# docker run -it --rm alpine ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
36: eth0@if37: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1400 qdisc noqueue state UP
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

As you can see, this reports MTU=1400 for eth0, so I figured this was working.

The interesting thing is that when you connect the container to a new Docker network, the MTU is set back to 1500:

root@worker-0:~# docker network create foo-net
9f396dadceedac15aaee838e60b31120a4c43fa9c39aa3ecca37772bd61992a1

root@worker-0:~# docker run -it --rm --network foo-net alpine ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
39: eth0@if40: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
    link/ether 02:42:ac:13:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.2/16 brd 172.19.255.255 scope global eth0
       valid_lft forever preferred_lft forever

It appears that the daemon.json MTU setting does not apply to Docker networks. You need to set this explicitly:

root@worker-0:~# docker network create --opt com.docker.network.driver.mtu=1400 bar-net
d23f6c6fe6796fd14683d4d63ff72e56dd5977d8057d7acb0fdd7a105675c6d8

root@worker-0:~# docker run -it --rm --network bar-net alpine ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
42: eth0@if43: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1400 qdisc noqueue state UP
    link/ether 02:42:ac:14:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.20.0.2/16 brd 172.20.255.255 scope global eth0
       valid_lft forever preferred_lft forever

In this example, I created a new network that explicitly set the MTU to 1400 and you can see that reported in the container.

This seems kind of broken because this means that when MTU is important, you'd need to explicitly update any docker network related scripts and compose/stack files to specify the MTU rather than having these inherited from the daemon config which would be a pain when trying to deploy things to environments that required different MTUs.

I just attended KubeCon 2018 this week and I'm moving on to stock Kubernetes now. Swarm looks like a dead end and Kubernetes as matured a lot since the last time I looked at it and hopefully won't have issues like these.

@ChrML
Copy link

ChrML commented May 24, 2019

This is still an issue on Windows Server 2019 version 1809 running Windows containers. When using Docker on any cloud system such as OpenStack which has a maximum MTU of 1450 this causes packet drop issues. Such as "nuget restore" to fail to download.

This is quite a pain to manage and requires workarounds that compromise the idea of Docker images being host configuration independant ...

@ghost
Copy link

ghost commented May 24, 2019

Same issue here on Windows Server 1803, all updates running Docker EE 18.09.6. Setting mtu in daemon.json does not work.

@ghost
Copy link

ghost commented May 24, 2019

Setting mtu on overlay network with "com.docker.network.driver.mtu": "1400" also does not work

@pjh
Copy link

pjh commented Jun 19, 2019

While poking around on my Windows kubernetes nodes I stumbled upon a sort-of workaround for this issue that seems to reduce all container MTUs by 50 bytes (sufficient for my platform, obviously not a general solution). I've shared a script of commands with my annotations here.

Basically, adding a new HNS network with type L2Bridge seems to reduce the default MTU of all new container interfaces to 1450, even if those containers are not connected to the new HNS network. The HNS network is created using the New-HnsNetwork command from hns.psm1. Presumably the 50-byte reduction is to leave room for some sort of encapsulation within HNS internally, although the behavior of the specific networks and interfaces is a bit mystifying to me.

I'm going to see if I can get some Microsoft folks to comment on what I'm seeing. cc @dineshgovindasamy, @madhanrm

@roderickgreen
Copy link

roderickgreen commented Sep 23, 2020

I also have this problem when the host is on a vpn. Here's the workaround I settled on until this is eventually fixed. It uses an AtStartup scheduled task in the container to set the mtu to 1400. This works on at least the server core images I've tried.

RUN $action = New-ScheduledTaskAction -Execute "netsh" -Argument 'interface ipv4 set subinterface \"Ethernet\" mtu=1400'; `
    $trigger = New-ScheduledTaskTrigger -AtStartup; `
    Register-ScheduledTask -TaskName 'SetMtu' -User 'SYSTEM' -Action $action -Trigger $trigger;

lzhecheng added a commit to lzhecheng/antrea that referenced this issue Apr 28, 2021
Container MTU should be properly configured for better performance.
MTU cannot be configured with HNSEndpoint or API on Windows.
https://github.com/Microsoft/hcsshim/blob/4a468a6f7ae547974bc32911395c51fb1862b7df/internal/hns/hnsendpoint.go#L12
moby/moby#35683
@johan-smits
Copy link

Are there any workarounds available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests