https://netmaker.org logo
#client
Title
# client
a

average-helicopter-96869

06/02/2022, 7:32 PM
error broker port is blank - upgrading from 0.14.1 to 0.14.2
wondering if i missed a step on the upgrade? I updated my netmaker & ui docker images to v0.14.2 all my 0.14.1 clients were still happy
so then i upgraded one netclient to 0.14.2 that one could not check in... giving an error
Copy code
netclient[59819]: [netclient] 2022-06-02 14:25:12 error publishing ping, mq setup error error: broker port is blank
j

jolly-london-20127

06/02/2022, 7:33 PM
it should fallback and retrieve the port if you give it a few minutes
b

bored-island-21407

06/02/2022, 7:33 PM
you may have to wait for a couple of minutes for the client to do a pull
a

average-helicopter-96869

06/02/2022, 7:33 PM
i noticed that the
/etc/netmaker/config/netconfig-main
had this block
Copy code
server:
    corednsaddr: ""
    apihost: ""
    apiport: ""
    clientmode: ""
    dnsmode: ""
    version: v0.14.2
    mqport: ""
    server: broker.netmaker.MYDOMAIN
b

bored-island-21407

06/02/2022, 7:33 PM
or you could do a manual pull
a

average-helicopter-96869

06/02/2022, 7:34 PM
hmm... ok, i'll try another node to verify that ... i worked around on this node my manually setting the port
hmmm
Copy code
# netclient pull -v

[netclient] 2022-06-02 19:41:46 No network selected. Running Pull for all networks. 

[netclient] 2022-06-02 19:41:46 Error pulling network config for network:  family 

 Post "https:///api/nodes/adm/family/authenticate": http: no Host in request URL 

[netclient] 2022-06-02 19:41:46 Error pulling network config for network:  main 

 Post "https:///api/nodes/adm/main/authenticate": http: no Host in request URL 

[netclient] 2022-06-02 19:41:46 register at https:///api/server/register 

[netclient] 2022-06-02 19:41:47 restarting netclient.service 

[netclient] 2022-06-02 19:41:48 reset network and peer configs
and my systemd logs for the netclient unit:
Copy code
Jun 02 19:43:53 tunnel netclient[143179]: [netclient] 2022-06-02 19:43:53 initializing network main

Jun 02 19:43:53 tunnel netclient[143179]: [netclient] 2022-06-02 19:43:53 netclient daemon started for server:  broker.netmaker.MYDOMAIN

Jun 02 19:43:53 tunnel netclient[143179]: 2022/06/02 19:43:53 could not read client cert/key tls: private key does not match public key
b

bored-island-21407

06/02/2022, 7:48 PM
interesing that your certs/key got out of sync... bute force way to recover ... on sever delete files in /root/certs and restart docker containers
a

average-helicopter-96869

06/02/2022, 7:48 PM
btw, i can totally work around this... just trying to highlight the issue and see if there are "best practice" steps for the upgrade and/or the move from port 8883 to 443
j

jolly-london-20127

06/02/2022, 7:48 PM
@bored-island-21407 I wonder if the change I made to retrieving the broker address resets the whole server section of the config. That would explain it
but I dont think I did that...
b

bored-island-21407

06/02/2022, 7:49 PM
no ...
a

average-helicopter-96869

06/02/2022, 7:49 PM
@jolly-london-20127 that sounds promising,... because here's a node that's still on 14.1
Copy code
server:
    corednsaddr: ""
    accesskey: SOMESTUFF
    server: broker.netmaker.MYDOMAIN
    api: api.netmaker.MYDOMAIN:443
b

bored-island-21407

06/02/2022, 7:50 PM
when i was testing the upgrade scenario prior to release this am ... i did the same steps .. it took awhile but the node eventually recovered
a

average-helicopter-96869

06/02/2022, 7:52 PM
i'm going to downgrade aclient and try to reproduce this
j

jolly-london-20127

06/02/2022, 7:58 PM
it may be an issue of doing too much to it before it has a chance to recover automatically
when you attempt to reproduce, please try leaving the client for ~5min to see if it's able to reset its configs automatically
a

average-helicopter-96869

06/02/2022, 8:03 PM
will do.... I'm keeping one node in known good state from 14.1 another in my broken state on 14.2 and reverting a broken one to 14.1
the revert was successful, though i had to manually put back the
server: api:
field as it was before... then a
netclient pull
was good and that node is communicating fine with broker again on 8883
b

bored-island-21407

06/02/2022, 8:10 PM
what os are your nodes running?
a

average-helicopter-96869

06/02/2022, 8:11 PM
so to recap the steps here... My docker-compose was on 0.14.1 and thus did not have an MQ_PORT set... 1) upgrade docker-compose netmaker/ui to 0.14.2 (do NOT setup mqtt over traefik, just using bare 8883 port, still, no MQ_PORT) 2) (linux node) systemctl stop netclient 3) (linux node) wget https://github.com/gravitl/netmaker/releases/download/v0.14.2/netclient to /sbin/netclient , chmod 755 /sbin/netclient 4) (linux node) systemctl retart netclient; journalctl -f -u netclient
now i'm watching
most are linux (ubuntu 22.04 servers and one fedora desktop) ... plus one windows machine, but i've only been troubleshooting on linux because its easier for me
b

bored-island-21407

06/02/2022, 8:13 PM
🥂 you did see my avatar, right
a

average-helicopter-96869

06/02/2022, 8:13 PM
yes 🙂 i was ashamed to tell you about the windows box
ok, i don't think this will recover
pasting logs and config to show why
b

bored-island-21407

06/02/2022, 8:15 PM
did you add an MQ_PORT in the env for netmaker
a

average-helicopter-96869

06/02/2022, 8:15 PM
no
b

bored-island-21407

06/02/2022, 8:16 PM
but it should default to 8883 if it isn't set
a

average-helicopter-96869

06/02/2022, 8:16 PM
that's what i figured... i'm effectively trying to do what was in the announcment > If you'd like to keep your existing Caddy proxy, you can just update the images to 0.14.2 and run as-is (with port 8883).
Copy code
Jun 02 15:10:50 MYNODE systemd[1]: Started netclient.service - Netclient Daemon.
Jun 02 15:10:50 MYNODE netclient[71487]: [netclient] 2022-06-02 15:10:50 initializing network family
Jun 02 15:10:50 MYNODE netclient[71487]: [netclient] 2022-06-02 15:10:50 started daemon for server  broker.netmaker.MYDOMAIN
Jun 02 15:10:50 MYNODE netclient[71487]: [netclient] 2022-06-02 15:10:50 netclient daemon started for server:  broker.netmaker.MYDOMAIN
Jun 02 15:11:20 MYNODE netclient[71487]: [netclient] 2022-06-02 15:11:20 unable to connect to broker, retrying ...
Jun 02 15:11:20 MYNODE netclient[71487]: [netclient] 2022-06-02 15:11:20 unable to connect to broker error: broker port is blank
Jun 02 15:11:50 MYNODE netclient[71487]: [netclient] 2022-06-02 15:11:50 local port has changed from  42624  to  41916
Jun 02 15:12:20 MYNODE netclient[71487]: [netclient] 2022-06-02 15:12:20 unable to connect to broker, retrying ...
Jun 02 15:12:20 MYNODE netclient[71487]: [netclient] 2022-06-02 15:12:20 could not publish local port change
Jun 02 15:12:50 MYNODE netclient[71487]: [netclient] 2022-06-02 15:12:50 unable to connect to broker, retrying ...
Jun 02 15:12:50 MYNODE netclient[71487]: [netclient] 2022-06-02 15:12:50 error publishing ping, mq setup error error: broker port is blank
Jun 02 15:12:50 MYNODE netclient[71487]: [netclient] 2022-06-02 15:12:50 running pull on family to reconnect
Jun 02 15:12:50 MYNODE netclient[71487]: [netclient] 2022-06-02 15:12:50 could not run pull on family, error: Post "https:///api/nodes/adm/family/authenticate": http: no Host in request URL
Jun 02 15:12:50 MYNODE netclient[71487]: [netclient] 2022-06-02 15:12:50 checkin for family complete
i'm not sure where, but sometime in that period... my
netconfig-family
changed from having
Copy code
server:
    corednsaddr: ""
    accesskey: ""
    server: broker.netmaker.MYDOMAIN
    api: api.netmaker.MYDOMAIN:443
to...
Copy code
server:
    corednsaddr: ""
    apihost: ""
    apiport: ""
    clientmode: ""
    dnsmode: ""
    version: ""
    mqport: ""
    server: broker.netmaker.MYDOMAIN
which is why there's no api hostname to pull from
so pulls fail
i'm going to try setting
MQ_PORT: "8883"
in my docker-compose and then re-attempt the upgrade
same results
b

bored-island-21407

06/02/2022, 8:39 PM
hmmm , well that is a scenario I didn't test ... i did all my testing with traefik rather than caddy but i am a bit baffled as to the root cause of the issue
a

average-helicopter-96869

06/02/2022, 8:40 PM
yeah, i don't think it's a traefik/caddy related issue
it seems to be related to the model change of the config file for netclient 14.1 vs 14.2
and in my case ... it's traefik -> traefik, 14.1 -> 14.2 , but not even changing the MQ port
and i can confirm, if i wipe out the /etc/netclient/config/* for my node, and manually delete it from netmaker-ui i can cleaning join a 14.2 netclient
that's why i think it really seems to be a problem in the client config upgrade
b

bored-island-21407

06/02/2022, 8:45 PM
your compose files has SERVER_API_CONN_STRING?
a

average-helicopter-96869

06/02/2022, 8:45 PM
yep
Copy code
environment:
      SERVER_NAME: "broker.${NM_BASE_DOMAIN}"
      SERVER_HOST: "${NM_PUBLIC_IP}"
      SERVER_API_CONN_STRING: "api.${NM_BASE_DOMAIN}:443"
      COREDNS_ADDR: "${NM_PUBLIC_IP}"
      DNS_MODE: "on"
      SERVER_HTTP_HOST: "api.${NM_BASE_DOMAIN}"
      API_PORT: "8081"
      CLIENT_MODE: "on"
      MASTER_KEY: "${NM_MASTER_KEY}"
      CORS_ALLOWED_ORIGIN: "*"
      DISPLAY_KEYS: "on"
      DATABASE: "sqlite"
      NODE_ID: "netmaker-server-1"
      MQ_HOST: "mq"
      #MQ_PORT: "443"
      HOST_NETWORK: "off"
      VERBOSITY: "1"
      MANAGE_IPTABLES: "on"
      PORT_FORWARD_SERVICES: "dns"
b

bored-island-21407

06/02/2022, 8:47 PM
I am trying to determine where the api is getting set to blank
a

average-helicopter-96869

06/02/2022, 8:48 PM
this is my netclient config on the clean 14.2 install
Copy code
server:
    corednsaddr: MYIP
    apihost: api.netmaker.MYDOMAIN:443
    apiport: "8081"
    clientmode: ""
    dnsmode: "on"
    version: v0.14.2
    mqport: "8883"
    server: broker.netmaker.MYDOMAIN
ok, i have a workaround at least
b

bored-island-21407

06/02/2022, 8:52 PM
ok i have a 14.1 client connected to a server running the test build (aka 14.2)
I am going to update the client
a

average-helicopter-96869

06/02/2022, 8:52 PM
manually add
Copy code
apihost: api.netmaker.MYDOMAIN:443
    apiport: "8081"
to the
server
section of the netclient-netname config file in /etc/netclient/config/ then
netclient pull -v
and the node started working for me again
(i'm not sure if apiport was needed)
i'll confirm
b

bored-island-21407

06/02/2022, 8:54 PM
it should not be
a

average-helicopter-96869

06/02/2022, 8:54 PM
yeah, i didn't think so
yeah, 0.14.1 expected
server: api: HOSTNAME_OF_NETMAKER:443
and 0.14.2 expects
server: apihost: HOSTNAME_OF_NETMAKER:443
so that's why that bit is broken...
maybe
i'm guessing
b

bored-island-21407

06/02/2022, 9:00 PM
yes I remember that being changed but i thought we had a recover in place
a

average-helicopter-96869

06/02/2022, 9:01 PM
ok, well, good luck... i gotta get back to my actual job 😉 if i can help test something specific, let me know
b

bored-island-21407

06/02/2022, 9:04 PM
will do ... thanks a bunch for helping with this ( and for the traefik stuff)
a

average-helicopter-96869

06/02/2022, 9:08 PM
my pleasure
f

few-airline-95046

06/02/2022, 10:08 PM
I tried adding apihost: api.netmaker.MYDOMAIN:443 to the netconfig- file, and the pull on the client worked, but the status still shows error...saw in the docker logs mq "sslv3 alert bad certificate" so I tried wiping the certificates in /root/certs/ on the server and restarted...but now I get no new certs at all in there? 🤔
b

bored-island-21407

06/02/2022, 10:11 PM
Restart the netmaker container
f

few-airline-95046

06/02/2022, 10:11 PM
I did
b

bored-island-21407

06/02/2022, 10:12 PM
Netmaker will gen certs on startup if they are missing
f

few-airline-95046

06/02/2022, 10:12 PM
Ok, but they're not in the /root/certs folder :/
Hmm, I saved my old docker-compose.yml before changing to the docker-compose.traefik.yml. In the old the mq volumes look like
Copy code
volumes:
      - /root/mosquitto.conf:/mosquitto/config/mosquitto.conf
      - /root/certs/:/mosquitto/certs/
      - mosquitto_data:/mosquitto/data
      - mosquitto_logs:/mosquitto/log
in the new traefik based one:
Copy code
volumes:
      - /root/mosquitto.conf:/mosquitto/config/mosquitto.conf
      - mosquitto_data:/mosquitto/data
      - mosquitto_logs:/mosquitto/log
      - shared_certs:/mosquitto/certs
is that relevant?
b

bored-island-21407

06/02/2022, 10:17 PM
In that case you need to delete them from the shared certs docker volume
f

few-airline-95046

06/02/2022, 10:23 PM
Thanks, looks like that solved it.. But why was the volume for certs changed? 😄
b

bored-island-21407

06/02/2022, 10:29 PM
It was a community submitted PR
f

few-airline-95046

06/02/2022, 10:30 PM
Ah. the certs are maybe less prone to be accidently deleted that way.
Time to sleep, will continue to fiddle with this tomorrow 😄
a

average-helicopter-96869

06/03/2022, 3:42 AM
the docker-compose yaml files provided are really intended to be a guide, not a production solution... specifically related to the volume definitions...
Copy code
volumes:
  traefik_certs: {}
  shared_certs: {}
  sqldata: {}
  dnsconfig: {}
  mosquitto_data: {}
  mosquitto_logs: {}
this isn't a recommended way to actually do volumes in docker... it works, but its really more like a place holder.
at least, that's my opinion 😉
with that default config, the volumes are assigned to some location as specified by the docker daemon configuration... which, if your systems is linux with stock configs, usually means it's buried somewhere under
/var/lib/docker
but it's not exactly obvious where your data was stored
i really like this method for standalone servers like the small virtual machine where i run netmaker... https://docs.docker.com/storage/bind-mounts/#use-a-bind-mount-with-compose
you can see how i've used that in my personal repo (which has not yet been updated to 14.2) https://github.com/bsherman/netmaker-traefik/blob/main/docker-compose.yml#L140
anyway, i also should apologize... I was responsible for the change of
/root/certs
to
shared_certs:
in my contribution of the
docker-compose.traefik.yml
i kept the simple default volumes to avoid complications for folks upgrading from caddy to traefik, but didn't think about the complication with respect to the changed one.
j

jolly-london-20127

06/03/2022, 1:26 PM
@few-airline-95046 @average-helicopter-96869 we think we've narrowed down the issue. Did you upgrade the clients before upgrading the server?
f

few-airline-95046

06/03/2022, 1:28 PM
One node might have been updated before, but not the second one that i tried later yesterday night
j

jolly-london-20127

06/03/2022, 1:39 PM
can you share your docker-compose (before and after)? would help with recreating the issue
f

few-airline-95046

06/03/2022, 1:51 PM
I can do it in a couple of hours probably
@jolly-london-20127, I see that you've released new binaries, so you don't need the docker-compose files any more? 😄
Hmm, updated to the latest netclient binary, but still seeing
Copy code
root@Cradle:/boot/config/netclient# netclient pull --vvv --daemon off
[netclient] 2022-06-03 23:08:54 No network selected. Running Pull for all networks.
[netclient] 2022-06-03 23:08:54 Error pulling network config for network:  xxx
 Post "https:///api/nodes/adm/xxx/authenticate": http: no Host in request URL
[netclient] 2022-06-03 23:08:54 register at https:///api/server/register
[netclient] 2022-06-03 23:08:55 restarting netclient.service
[netclient] 2022-06-03 23:08:56 error running command: systemctl restart netclient.service
[netclient] 2022-06-03 23:08:56
[netclient] 2022-06-03 23:08:56 reset network and peer configs
And then when trying to run netclient in deamon mode (no systemd on that machine) I get
Copy code
root@Cradle:/boot/config/netclient# netclient daemon
[netclient] 2022-06-03 23:11:55 initializing network xxx
[netclient] 2022-06-03 23:11:55 started daemon for server  broker.netmaker.xxx.se
[netclient] 2022-06-03 23:11:55 netclient daemon started for server:  broker.netmaker.xxx.se
2022/06/03 23:11:55 could not read client cert/key tls: private key does not match public key
j

jolly-london-20127

06/03/2022, 9:17 PM
If you already had that issue, updating will not solve it. Once the api address is missing you need to add manually
f

few-airline-95046

06/03/2022, 9:28 PM
Alright, but I have already modified /etc/netclient/config/netconfig-mydomain to have apihost set to my proper api url
a

average-helicopter-96869

06/04/2022, 4:45 PM
in my case, I'd updated the server to 0.14.2 first, let things settle... and all my clients were working on 0.14.1.... then experienced the problem when updating a client to 0.14.2
my docker compose was literally: https://github.com/bsherman/netmaker-traefik/blob/main/docker-compose.yml and then upgraded by changing 0.14.1 to .2
also, apologies for my delay, i was in airports all day yesterday... travelling so not very accessible for a few days.
j

jolly-london-20127

06/07/2022, 8:06 PM
no worries, we put a hotfix in the release which should solve this issue