On the XWiki project, we use TestContainers to run our functional Selenium tests inside Docker containers, in order to test various configurations.
We've been struggling with various test flickering due to infra issues and it took us a long time to find out the issue. We could see for example problems from time to time starting the Ruyk docker container by TestContainers and we couldn't understand it. But we had plenty of other issues that were hard to track too.
After researching it we've found that the main issue was caused by the way we set up our iptables on our CI agents.
These agent machines have network interfaces exposed to the internet (on eth0 for example) and in order to be safe, our Infra Admin, had blocked all incoming and outgoing network traffic by default. The itables config was like this:
:FORWARD ACCEPT [0:0]
:OUTPUT DROP [0:0]
-A INPUT -i lo -j ACCEPT
-A INPUT -i br0 -j ACCEPT
-A INPUT -i tap0 -j ACCEPT
-A INPUT -i eth0 ... specific rules
-A OUTOUT -i eth0 ... specific rules
-A OUTPUT -o lo -j ACCEPT
-A OUTPUT -o br0 -j ACCEPT
-A OUTPUT -o tap0 -j ACCEPT
Note that since we block all interfaces by default, we need to add explicit rules to allow some (lo, br0, tap0 in our case).
The problem is that Docker doesn't add rules for the INPUT/OUTPUT chains! It only adds iptable NAT rules (see Docker and iptables for more details). And it creates new network interfaces such as docker0 and br*. Since by default we forbid INPUT/OUPUT on all interfaces, this meant that the connections between container and the host were refused on lots of cases.
So we changed it to:
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -i eth0 ... specific rules
-A OUTOUT -i eth0 ... specific rules
-A INPUT -i eth0 -j DROP
-A OUTPUT -o eth0 -j DROP
This config allows all internal interfaces to accept incoming/outgoing connections (INPUT/OUTPUT) while making sure that the eth0 only exposes the minimum to the internet.
This solved our docker network issues.