Test Kitchen Ansible Serverspec Debug

I recently set up test kitchen in our Ansible repo, and saw that is passed the “swarm is running” spec even though swarm wasn’t running. Pretty disturbing behavior from a testing tool. This is the story of how I got to the bottom of it.

We use serverspec, an extension to rspec, which provides matchers for system configuration stuff. The spec in question is just:

describe service('swarm') do
 it { should be_running }
 end

I read through the serverspec source and found the commands it uses to verify services, but running those on the VM gave expected outputs, so it was time for some debugging.

First, I wanted to eliminate the Test Kitchen infrastructure and just run rspec directly on the VM. After converging the VM, I ran verify with debug output to see how it is calling rspec:

% kitchen verify --log-level debug
 ---clipped---
 cd /tmp/kitchen
 RSPEC_CMD=$(which rspec)
 echo $RSPEC_CMD
 sudo -E -H $RSPEC_CMD -c -f documentation --default-path /tmp/kitchen -P ./roles/swarm/spec/swarm_spec.rb
 ---clipped---

Running this while logged in to the VM duplicates the problem, so that’s good. Next, I added byebug to the Gemfile in /tmp/kitchen and ran bundle install.

Then I ran rspec pre-pending “byebug” so it will drop me into the byebug debugger:

$ sudo -E -H byebug $RSPEC_CMD -c -f documentation --default-path /tmp/kitchen -P ./roles/swarm/spec/swarm_spec.rb

From examining the serverspec code, I know that I’m looking for a call to the Service.running? method, so add a breakpoint for it:

(byebug) break Serverspec::Type::Service#running?

From here I can easily step into the code that’s checking the service:

[1, 10] in /Users/build/.rvm/gems/ruby-2.1.3/gems/specinfra-2.61.1/lib/specinfra/processor.rb
 1: module Specinfra
 2: class Processor
 3: def self.check_service_is_running(service)
 4: cmd = Specinfra.command.get(:check_service_is_running, service)
 5: ret = Specinfra.backend.run_command(cmd)
 => 6:

(byebug) cmd
 "launchctl list | grep swarm | grep -E '^[0-9]+'"

(byebug) ret
 #

Exit status 1 looks right: the grep doesn’t match anything. Ah, but there’s more: if the exit status is 1, it does a second check:

[13, 22] in /Users/build/.rvm/gems/ruby-2.1.3/gems/specinfra-2.61.1/lib/specinfra/processor.rb
 13: if ret.exit_status == 1
 14: cmd = Specinfra.command.get(:check_process_is_running, service)
 15: ret = Specinfra.backend.run_command(cmd)
 16: end
 17:
 => 18: ret.success?
 19: end
 20:

(byebug) cmd
 "ps aux | grep -w -- swarm | grep -qv grep"

(byebug) ret
 #

Exit status zero means the grep matched something — that the service is running. That’s not right. Removing the “q” from the last grep shows:

$ ps aux | grep -w -- swarm | grep -v grep
 root 72088 0.0 0.6 2557624 12944 s000 S+ 5:07PM 0:00.06 sudo -E -H byebug /usr/local/bin/rspec -c -f documentation --default-path /tmp/kitchen -P ./roles/swarm/spec/swarm_spec.rb

“swarm” matches the path and name of the spec file being executed. LOL.

So what’s the right fix here? It seems like pgrep might be a better choice than ps | grep in this case because we only want to search on the process name.

To confirm this works, I cloned the specinfra repo in the VM, made the change, and added this to the Gemfile in the VM:

path '/tmp/specinfra' do
 gem 'specinfra'
 end

And then built the gem with gem build specinfra.gemspec and installed it with gem install –local specinfra-2.61.1.gem (I could also use bundle exec in the rspec command-line instead of relying on globally installed gems, but this is a VM I’m going to throw away, so it doesn’t much matter). I then saw rspec fail the test locally (in the VM) as expected:

1) Service "swarm" should be running
 Failure/Error: it { should be_running }
 expected Service "swarm" to be running
 /bin/sh -c pgrep\ swarm

I then ran kitchen verify just for fun, expecting to see the spec fail, but was briefly confused when it passed. I finally realized that the ruby version in the terminal and via ssh were not the same. Ansible installed rvm when I converged the VM, modifying the PATH in ~/.bash_profile, so terminal logins got rvm, whereas ssh-based commands were still getting the default MacOS ruby 1.9 (and gems associated with that version). Once I realized this, I was able to install the patch for 1.9 and see the spec fail with kitchen verify as well.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s