<@738151527344111666> really need some help here. ...
# netmaker
a
@jolly-london-20127 really need some help here. I have created the netclient with statefulset and pvc but when I deployed those clients across my nodes, all the peers cannot handshake. Only the master node can handshake with all the client nodes but the client to client communication is not working in a default mesh network.
j
do you have logs and are the nodes healthy in dashboard?
need more information, do you have logs and are nodes healthy in UI?
what image are you using for netclient / what version?
a
Netclient version 0.14.6 the latest one. The nodes are healthy and all the other pods running inside those nodes are working properly. But I do not have any logs. I will collect it right now.
Netmaker version also 0.14.6
Using netclient image only. Not the netclient-go
Copy code
[netclient] joining network

2022-07-31 15:10:32 joining vpn at api.mydomain.com:443 

2022-07-31 15:10:34 node netclient-0 is using port 31825 

2022-07-31 15:10:35 starting wireguard 

2022-07-31 15:10:37 certificates/key saved  

2022-07-31 15:10:38 sent a node update to server for node netclient-0 ,  5d293a69-1173-45d9-82d2-5f385b891e6f 

2022-07-31 15:10:40 error running command: systemctl restart netclient.service 

2022-07-31 15:10:40  

Starting netclient daemon

2022-07-31 15:10:40 checking for netclient updates... 

2022-07-31 15:10:40 finished updates 

2022-07-31 15:10:40 error running command: sysctl -w  net.ipv6.conf.all.forwarding=1 

2022-07-31 15:10:40 sysctl: error setting key 'net.ipv6.conf.all.forwarding': Read-only file system 

2022-07-31 15:10:40 exit status 1 

2022-07-31 15:10:40 initializing network vpn 

2022/07/31 15:10:40 WARNING: Error encountered setting ipv6 forwarding. You may want to investigate this.

2022-07-31 15:10:40 started daemon for server  broker.mydomain.com 

2022-07-31 15:10:40 netclient daemon started for server:  broker.mydomain.com 

2022-07-31 15:10:40 subscribed to node updates for node netclient-0 update/vpn/5d293a69-1173-45d9-82d2-5f385b891e6f 

2022-07-31 15:10:40 subscribed to peer updates for node netclient-0 peers/vpn/5d293a69-1173-45d9-82d2-5f385b891e6f 

[netclient] 2022-07-31 15:10:40 received peer update for node netclient-0 vpn 

2022-07-31 15:10:40 local port has changed from  0  to  31825 

2022-07-31 15:10:42 sent a node update to server for node netclient-0 ,  5d293a69-1173-45d9-82d2-5f385b891e6f 

2022-07-31 15:11:31 received peer update for node netclient-0 vpn 

2022-07-31 15:11:37 received peer update for node netclient-0 vpn 

2022-07-31 15:11:42 received peer update for node netclient-0 vpn 

2022-07-31 15:12:41 checkin for vpn complete
the logs look fine to me. btw am not using ipv6 and the ipv6 module is not loaded by my node kernel. Both netclients have similar log like this which seems reasonable. And also in the netmaker UI both healthy.
But still cannot ping each other.
j
it looks like ipforwarding is not working, which can be an issue
also, if you are working with the normal netclient, make sure you have wireguard installed on your nodes
did you turn privileged mode off? if so, you likely need to run an init container to get ipforwarding running
a
my nodes all have wireguard enabled with the kernel wireguard system module loaded.
yes privileged mode is off. So, that's the problem I suppose. Checking the init container method now.
I turned off privileged mode and only added these capabilities to the containers ->
Copy code
securityContext:
  capabilities:
    add:
    - NET_ADMIN
    - NET_RAW
    - SYS_MODULE
@jolly-london-20127 what ip forwarding rules does the netclient requires? Is it similar to the netmaker server? Should I just use the same init container commands?
j
yes the same init container should work
question about your setup, you have turned off privileged mode, do you only want the netclient nodes to contact each other, within the cluster, or should external traffic be able to reach it?
a
want external clients to reach my network actually. I want the netclient nodes to contact each other and also some ext clients on the network to access different ingress nodes and use egress node for egressing.
j
ok, so im not sure how exactly the external access is going to work without host-level access. I believe you may need to use a NodePort like we do for the netmaker server
a
Copy code
[netclient] joining network

[netclient] 2022-08-01 18:41:16 joining vpn at api.mydomain.com:443 

[netclient] 2022-08-01 18:41:18 node netclient-0 is using port 31825 

[netclient] 2022-08-01 18:41:20 starting wireguard 

[netclient] 2022-08-01 18:41:22 certificates/key saved  

[netclient] 2022-08-01 18:41:23 sent a node update to server for node netclient-0 ,  9cf1b361-3eb7-4b05-9cc1-bca538b2e3b1 

[netclient] 2022-08-01 18:41:25 error running command: systemctl restart netclient.service 

[netclient] 2022-08-01 18:41:25  

[netclient] Starting netclient daemon

[netclient] 2022-08-01 18:41:25 checking for netclient updates... 

[netclient] 2022-08-01 18:41:25 finished updates 

[netclient] 2022-08-01 18:41:25 initializing network vpn 

[netclient] 2022-08-01 18:41:25 started daemon for server  broker.mydomain.com 

[netclient] 2022-08-01 18:41:25 netclient daemon started for server:  broker.mydomain.com 

[netclient] 2022-08-01 18:41:25 subscribed to node updates for node netclient-0 update/vpn/9cf1b361-3eb7-4b05-9cc1-bca538b2e3b1 

[netclient] 2022-08-01 18:41:25 subscribed to peer updates for node netclient-0 peers/vpn/9cf1b361-3eb7-4b05-9cc1-bca538b2e3b1 

[netclient] 2022-08-01 18:41:25 received peer update for node netclient-0 vpn 

[netclient] 2022-08-01 18:41:25 local port has changed from  0  to  31825 

[netclient] 2022-08-01 18:41:26 sent a node update to server for node netclient-0 ,  9cf1b361-3eb7-4b05-9cc1-bca538b2e3b1 

[netclient] 2022-08-01 18:42:15 received peer update for node netclient-0 vpn 

[netclient] 2022-08-01 18:42:25 checkin for vpn complete 

[netclient] 2022-08-01 18:42:26 received peer update for node netclient-0 vpn 

[netclient] 2022-08-01 18:43:26 checkin for vpn complete 

[netclient] 2022-08-01 18:43:39 received peer update for node netclient-0 vpn 

[netclient] 2022-08-01 18:43:53 received peer update for node netclient-0 vpn 

[netclient] 2022-08-01 18:44:26 checkin for vpn complete
this is the new netclient logs with privileged mode on. still netclients cannot communicate with each other or the server.
I am very confused about this also. If I use the nodeprots then which ports should I use? the 31821-31830 nodeports all taken by the netmaker server actually.
the main goal is to contain the netclient networks inside the container. Because my kubernetes cluster already using flannel wireguard backend which have its own networks in the node/host level.
j
ooooh, is the netmaker server deployed on the same cluster as the netclients??
a
yes
I know it is not recommended but I had to. I have no other choice available. I deployed the netmaker server on the k8s master serevr.
j
ok, this could be the issue, though I am not certain
take a look at "wg show" on your nodes, and look at the endpoints
what is it using for the endpoint values: public IP of node, pod IP, or something else?
a
endpoints show the public ip of the nodes
including the port
j
ok, is udp hole punching on or off?
a
udp hole punching is ON in my network from the netmaker UI
j
what port are the nodes using, random port, or something like 51821 /51822
a
31825 which I set in the network using UI
the netclient.sh explicitely uses udpholepunch no flag as I can see
j
yeah, we may want to try it with that turned on
try going to each node and switching on "dynamic port" in the UI
a
ok trying it now
the ports have randomized on the nodes but still ping is not working.
j
ok, it sounds like your firewall is too restrictive for this to work with external interfaces by default, I would try using the internal interface
usually this is selected automatically but can be difficult depending on the environment
a
my firewall is fully open. I do not have any restrictions on the firewall in any node actually.
j
strange, I am testing on 2 clusters right now from different cloud providers and do not have an issue with node-node communications
if the public ip and port looks correct, typically the only thing that can get in the way is the firewall
a
that's very werid indeed. I have one network running which is using master as both ingress and egress so that I can access the cluster pod network with that. And I am currently using ext clients to perfectly access my pod network
j
is this on a different cluster?
a
same cluster
j
so you have a different network that is functioning correctly, but this one is not?
a
So, in the same kubernetes cluster, I deployed the netmaker server on the master node and created a network there. Network 1 inside the netmaker ui have my master server selected as both ingress and egress. And I have ext clients which can connect to this network and access the egress which defines my pod network. Now I created second network for VPN purposes but did not set any egress or ingress yet. Now deployed the netclients throughout my geo located nodes but now nodes cannot communicate to the network 2. So yeah one network is functioning properly which I created for accessing the pod network. But the other network 2 I created for vpn is not working at all. Provided I have only one node which is the master node on the first network. And I am trying to include more nodes on the second network 2.
j
ok, I understand
at the very least, the nodes in network 2 should be able to reach the netmaker-1 node, given it should have the nodeport set up
are the nodes healthy in the UI?
a
nodes are healthy in the UI but still cannot reach the master node
should I recreate the network freshly with a new access token and try this whole thing again?
j
I am not sure; if the nodes cant even reach the master node, and the endpoints are correct, it sounds like there is something else going on
oh also...you are using kernel wireguard right?
a
yes kernel wireguard. netclient image. not the netclient-go
j
ok, have you tried with netclient-go?
a
I cannot because my nodes are arm. netclient-go do not have arm image
j
do you have an arm machine where you can build the image?
my best guess right now is the issue is with the container attempting to use kernel wireguard
if we can try with userspace, we can at least isolate the issue
a
I have my own github actions runner for arm. But I have no clue how to build the image actually. Do you have a workflow file ready? Then i can build it and push it.
how do I build the netclient-go? I have no clue about this actually.
actually...we already build it multiarch, so it should run on arm...
a
what is the link to the dockerhub image?
j
a
it only shows amd64
the workflow file doesnt actually build for arm as I can see
j
ahhh yes I see that, I'll put in a PR
in the meantime, try building your own arm version to run
a
alright
trying it now
btw just to inform this better. the netclients can ping the master server node right now. All of them can ping the master node. But not each other.
also the master server can ping all the other nodes properly. no issue from master node to netclient nodes communication.
Is this still a wireguard kernel problem? I might think its not
j
yeah if they can ping the master then it shouldn't be a wireguard problem
a
I am so confused man. UDP Hole punching is also on. why the netclient nodes cannot comm each other?
ACL is also default
j
the only other thing i can think is because netmaker is on the cluster with them
a
why would it create any problem?
they are on different networks anyway
j
for udp hole punching, the server determines the correct port and sends to nodes to connect to the server
if the nodes connect to server over a local interface, that port can be incorrect
a
last time when I had hostnetwork on and I tested, it worked you know
j
hmmm so it worked initially, and just doesn't work after switching off hostnetwork?
a
yes
j
why can you not use host network? I think this is probably necessary
a
just to be safe. last time I made some of my nodes unreachable by doing hostnetwork and egress. I just want to isolate the networks
j
if you need to connect from ext clients, the traffic will have to come in over host network anyway
a
you got a point
but then how does traffic mapping to the netmkaker server node? it doesnt have hostnetwork enabled
j
nodeport
this is the last thing you can try
a
how do I use nodeport then? the netclients use which nodeport value?
j
you will turn off udp hole punching and set the port equal to the node port
and create a nodeport service like we do for the server, but for a single port
a
got it. trying it right now
j
Copy code
---
apiVersion: v1
kind: Service
metadata:
  labels:
  name: 'netmaker-wireguard'
spec:
  externalTrafficPolicy: Local
  type: NodePort
  ports:
  - port: 31851
    nodePort: 31851
    protocol: UDP
  selector:
    app: 'netclient'
try this, and set netclient ports to 31851
a
ok
@jolly-london-20127 problem solved with this. thanks a lot for the help man.
when wg show on the nodes, the client ports show a random port actually. even though i set all clients to use that nodeport. that is expected or something I should look further into?
btw I turned off UDP hole punching on the network.
and the nodes stll show random ports at the endpoint
j
unfortunately for now, you can't switch a network between udp on/off, I dont think switching it will switch the network, so you have to turn off "dynamic port" individually on the nodes
but they are still connecting even with random port?
a
yes even with random port they are connecting. so I should change the udp hole punch before and then try dpeloy the clients?
even when I turned off dynamic port on all nodes, they still using random port
j
hmmm did you manually input the port after updating and submit?
a
yes I tried manually updating the ports and also did netclient pull explicitly
Also, if I create a VPN using these nodes by egressing the public network subnets. the vpn is working very slow actually. it is taking too long to load every website. any idea why this might happen?
j
hmmm i just tried on mine and it seems to update fine
maybe check the logs, see if there's any errors connecting to the server
a
checking it now
j
wireguard-go performance depends on resource availability, so could be the pod's resource limits
also, depending on how it is set up, it may be making many hops to reach the egress node
for example, from one pod on the network to go out through another pod, it needs to go from pod1 --> node1 --> node2 --> pod2 --> node2 --> internet
a
I understand
One network is working now properly. But if I try to add another network from the pod with netclient join then the added network doesn't work. No ping to master or any other nodes.
j
you will need another nodeport
a
@jolly-london-20127 I have another 3 nodeports defined on the same service. Still it is not working actually.
3 Views