My Experience in the Restaurant Industry

My first job was at El Monumento, a fine dining Mexican restaurant. My work experience was very limited, as I had never worked a “real” job in my life before. I began work in July of 2017 and ever…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Repairing a Corrupted Hashicorp Vault Policy

An interesting thing happened this week. Vault crashed and failed over to a standby node. What’s interesting is that the new master ended up with a forked policy. We didn’t know this until one of our data teams reached out saying “we can’t decrypt anything older than Feb 13” which impacted around 250M rows of data. The impacted table needed around 1700 keys but only ~270 were stored in the current policy.

After researching online for solutions and coming up short, we reached out to Hashicorp for support and a sanity check. Unfortunately, they only support Vault HA setups that leverage Consul, and we were using zookeeper, so they said the only thing they could do was refer us to a private consultant. We pushed on and solved the problem, but this was a little disappointing to hear in our time of need.

The first thing we did was export the most recent policy as a plain text json file. To do this we had to quickly build a custom Vault binary. The sdk/helper/keysutil package has a LoadPolicy function which is called for each policy when a node becomes the new master. We wrote a few lines to dump all policies to the /tmp directory. Note: Our systemd service config for Vault had PrivateTmp=yes enabled, which is fine, but might make you scratch your head like we did before looking in the private tmp directory.

After uploading the binary, unsealing Vault, and running step-down, the new binary became master and dumped our policy to disk. There we found all ~270 encryption keys of our live policy. Still no sign of the missing keys.

Next we restored a backup of our PostgreSQL database taken just before Vault failed over and forked the policy. We made a forked config and manually started Vault in a way that wouldn’t take live traffic. Once the Vault was unsealed and the policy was written to the tmp directory, we were elated to find another ~1,440 keys. We checked for collisions on key ids and found none, which was also terrific news (although a little strange). All ~1700 keys were accounted for and now we just needed to merge them.

Once again we modified the LoadPolicy function. We added behavior to look for the specific path of the corrupted policy and — instead of querying the physical storage — load from disk, tell the policy to persist to physical storage, and then delete the policy from disk.

Before we started our custom build of Vault pointed at the primary database, we backed up the primary once more just to be safe. After starting and unsealing Vault, we ran step-down to promote the new custom binary as master and it loaded our merged policy just fine. We started testing different points of our system that were previously broken and they began returning successful responses.

Lastly we were sure to clean all plaintext policies, custom binaries, and sensitive data from logs files. After cleaning and restarting once again, everything continued to run smoothly.

I still have no idea how this split-brained forked policy was created in the first place. If you have thoughts, please leave a comment. We’ll be reaching out to Hashicorp again, and we’ll update if we learn anything. But at least everything is stable for now, and we have a playbook should it happen again.

Add a comment

Related posts:

Typical Saturdays

Saturdays are when we don’t have a routine. And we’re alright going whichever way we want. Maybe meet someone, maybe make a plan, maybe paint our nails, cook something, bake something, take long…

Toxicity of Hustle Culture

Some people are drained from a hustling and sleepless nights. Life is funny, isn’t it? We desperately chase goals we think we want, only to discover— as we pursue them— that we’re making sacrifices…

Hafez can teach us how to get the most out of our lives

The 14th-Century Persian poet Hafez’s work is not just very beautiful — it is useful too. Hafez can teach us how to get the most out of our lives, writes Daniel Ladinsky. Shams-ud-din Muhammad Hafez…