The nightmare that is DBD::ORACLE on RHEL 5

Today I ran into a nagios machine that needed to check database stuff internally (connected users, corrupt blocks etc etc).. My perl scripts kept popping up errors about this oracle…. I have dealt with this in the past and I remember it being a nightmare…. I know there is some perl junky out there reading this saying ” Why didn’t you use cpan?”.. And in this case I would have rather have, but cpan was having issues with other perl modules and wasn’t worth my time… Any way here is how you can install DBD::ORACLE somewhat easily.

Firstly make sure you grab your dependencies..


# wget



# wget

Now you need to grab the oracle instant client basic and sql plus .. I can’t post a snippit because you need to use your user credentials to grab it but here are the links with the version and everything you will need:



Now that you have everything downloaded, go ahead and install them in the order you downloaded them in, using the “rpm -Uvh [filename].rpm” command. Once this is done we will need to set the LD_LIBRARY_PATH and the ORACLE_HOME paths in the environment (check this afterwards to make sure it took using the “env” command)..

# export ORACLE_HOME=/usr/lib/oracle/

# export LD_LIBRARY_PATH=/usr/lib/oracle/


And last but not least download the perl-dbd-oracle rpm and install it using the  “rpm -Uvh [filename].rpm” command again…

# wget


And now you should be all set….





Getting Windows 8 OEM product key from BIOS using linux

With all the Windows 10 craze going on at the moment I felt like I should take my unregistered spare Windows 8 machine and activate it so I can pull Windows 10 down.. My dell machine that I am on right now typing this is running linux, however has a windows 8 pro key in the bios… Since I’m not using the key nor do I plan to on this machine, I wanted to pull this out and use it for the activation on the other machine.. So here it is fairly simple #NOTE I believe this could be done via a linux live cd if you are on Windows currently:

[root@localhost ~] # ls /sys/firmware/acpi/tables/

Look for something that says “MSDM”

cat the file out and it will display your product key that came with the machine.. It may come out looking a little funky but I was able to read it well enough… I have verified that the key works on my second windows 8 machine.. #NOTE: I have randomly changed letters and numbers in the product key below so that it will not be used or redistributed.

root@localhost ~] # sudo cat /sys/firmware/acpi/tables/MSDM



Centos 7 guest customization on vmware doesn’t work

Ran into this this morning… I was making a Centos 7 vmware template for an upcoming project that is dependent on this OS.. So I got everything up and running beautifully on the OS and converted it over to a template.. When I tried to enter in the new IP and hostname through the guest customization wizard it would appear to have taken the changes. However when the VM powered on the hostname was the same as the template and the IP was not changed (much to my dismay). After about an hour of tinkering and googling I have compiled the following as the solution until vmware decides to fix this:

Firstly install all dependencies

– perl

– net-tools

– open-vm-tools

– open-vm-tools-deploypkg

The last dependency there however is not in the repositories by default. No worries though vmware has provided a repository that has this open-vm-tools-deploypkg package, do the following to get this installed:


vi /etc/yum.repos.d/vmware-tools.repo 

Now paste this into there:


name = VMware Tools
baseurl =
enabled = 1
gpgcheck = 0

And lastly run your install:

yum install open-vm-tools-deploypkg

Now the other thing that I saw was messed up was that vmware didn’t recognize the OS properly and so the customization still didn’t work properly. For some reason they want you to be full fledged RHEL… While I wasn’t willing to pay for that I was willing to entertain this requirement by editing the redhat-release files like so:

[root@centos7x64 network-scripts]# cat /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)

And now we change it:

[root@centos7x64 network-scripts]# rm -f /etc/redhat-release && touch /etc/redhat-release && echo "Red Hat Enterprise Linux Server release 7.0 (Maipo)" > /etc/redhat-release
[root@centos7x64 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.0 (Maipo)


Now my guest customization using vmware is working perfectly. Enjoy!



Manage IQ – Setup/ Issues encountered

A colleague of mine recently introduced me to the open source version of RHEL’s CloudForm, – Manage IQ .. While I wasn’t sure what exactly I could use it for at first (the idea just sounded cool) I got this thing setup and within 2 hours was giving a pitch to the management team on a lot of its functionality and show casing that it worked completely cross platform due to its HTML 5 capabilities for the console (friggen VMWare and their love of flash) and we could consolidate our many Vsphere environments into a single pane of glass which was AWESOME for me as I am all about streamlining and automation. I could go on and on about the functionality and stuff I am planning to add to it but for now I will keep it simple and write on those subjects later on.

So any way, a quick over view of the general setup and things that should be done once you get this appliance deployed.

  • First things first hop on over to and grab the appliance.. It is roughly 1.2 gb so give yourself some time to download this (coffee breakkkk!)
  • Deploy .ova template to vSphere Environment (File -> Deploy OVF Template) again give yourself about 15 minutes here for the deployment to finish
  • Power on the appliance and wait for a full boot up

Now this is where things started to fall apart in the documentation provided by Manage IQ for me. I noticed that during the deployment of my ova template that there was only 1 vlan available to select. Which in itself was not a big issue as we could just change this later when its finished. But during this next step we will be taking care of setting up the network so I figured I would mention this.. If you are not familiar with how to set a new vlan on the NIC simply right click on the appliance click “Edit Settings”  in the left pan click “Network adapter 1” and the in the bottom right select the correct vlan from the “Network Label” and click “OK”.. Alright off my soap box.

  • Open the console in vmware (Select the VM from the pane on the left -> Console tab)
  • Enter user credentials which by defaults are admin/smartvm
  • This will bring you to your configuration screen. setup your network, hostname, TZ etc etc here is what it looks like (it pretty much walks you through everything)


  • After this is all set up to your liking you can just log right into the webconsole @ https://yourIPhere/ using the same “admin” and “smartvm” credentials.

Now this will bring you to dashboard by default with no info on it yet (don’t worry it will populate shortly) lets not worry just yet about the pretty little graphs and such that your management is going to want to drool over(it will look like this in due time).


  • Click on “Configure” in the top right hand corner of your screen
  • In the left pane select your “Zone” and fill in its details (company name, host name etc etc)
  • On this same page scroll down to the “Server Control” and assign the server roles.. You will notice there are some defaults already selected.. Save your self the trouble and  select them all ahead of time, assess what functionality you need and just scale back instead of pondering why a feature is greyed out later on down the line.



  • While you are on this screen now just fill out all the information on the right for your local smtp settings , and your ntp servers then click on the blue “Save” button on the bottom of the page.
  • From the top menu select the “Infrastructure” menu option. You will see configuration button with a sprocket that will drop down a few more menu options. Click on “Add a new Infrastructure Provider”
  •  Now this screen is where you will add your vCenter  instances. Give it a name, select the type of  provider (vmware in my case), give it the IP of your vcenter and add your root credentials and then click on the “Validate” button to make sure a connection is possible using the provided information. If all is good click “Add”.. Repeat this step for every separate Vcenter instance you run into.

ManageIQProvidersTake a little break at this point and let ManageIQ discover all of your hosts, vm’s , datastores etc etc.

  • After you have given Manage IQ some time to do what it does head back over to the “Configure” menu and select “Configuration” from the sub menu below. In the left pane select the “CMFE Region”  menu with the little globe to the left of it.
  • On the right click the menu ” C & U Collection” This is where you select which items you want to collect performance and utilization information on. I personally checked “Collect for all Clusters” and “Collect for all Datastores”  and then click on “Save” in the bottom right.
  • Now time to change the password to something more secure.. In the left pane there are a few other options click the one that says “Access Control”
  • Under “Users” select “Administrator” Click “Configuration” and “Edit this User” enter in a password in the appropriate field and click on “Save”
  • Now that we have everything set up head over to the “Infrastructure” Menu and click on “Hosts” in the left filter for “ALL” on the bottom right adjust the results per page to accommodate the amount of hosts you have. in the top left click “Check all”  then click the “Configuration” Option and select “Edit these hosts” (little pencil icon) add your ROOT credentials in these fields.. Choose the host to validate again and make sure they work and then click “Save” in the bottom right.
  • The last Item I will have you do is to go to the “Infrastructure” Menu and click on the “Virtual Machines” Menu and browser through the folders to locate your appliance (The VM you deployed earlier). Once you find it click on it and then click the “Configuration” Button and “Edit Management Engine Relationship”. And then select  the appliance name in the drop down and click save at the bottom.

Now you have a basic functioning Manage IQ instance… However I was interested checking out the Smartstate Analysis … And it turns out that the version botvinnik-1.20150629103140_eb92001 comes with a bugged version of the VixDiskLib library.. Which will give your errors in your “Tasks” (This is located in the “Configuration” menu under the sub menu “Tasks”) which look like this:



So there are two things I needed to do to resolve these issues…

1. Install a new version of VDDK (seems to be fixed in 5.5.4) You can get the installer from here

a. Place the .gz in a working directory

b. untar the .gz


[root@MIQSRV ~]# tar xzvf VMware-vix-disklib-5.5.4-2454786.x86_64.tar.gz


c. Move to the newly extracted folder and run the perl script that’s in there (this is quick and painless)

[root@MIQSRV ~]# cd vmware-vix-disklib-distrib/
[root@MIQSRV vmware-vix-disklib-distrib]# ls
bin64  doc  etc  FILES  include  installer  lib64
[root@MIQSRV vmware-vix-disklib-distrib]# ./

2. Setting the proper port for the smartproxy

a. Back on the web console head over to the “Configure” -> “Configuration” Menu

b. On the left pane underneath “Zone” select “Server: (Zone name here) ”

c. On the right now (in the main pane not the top menu) select the tab that says “Smart ProxY” and change the default of “1139” to “902” and click save.



If you made it this far you are a trooper.. I wanted to take some time to give credit where credit is due to the sources where I got my information from



Vsan: 12 (Cannot allocate memory) /Failed to create swap file

Good Morning All,

I ran into this again for like the 3rd time and figured I would put this here for my notes, maybe someone will find it helpful. Every now and again my little metro cluster (vSan 5.5.0) gets a little bit squirrely on me, and wont let me power on VM’s take snap shots etc, etc … So the symptoms are as follows:

– Altering any VM in any way through the “Edit Settings” window returns an error similar to  “12: cannot allocate memory” I believe it was this

An error occurred while taking a snapshot: 12 (Cannot allocate memory).
An unknown error has occurred.


– Attempting to power on any vm you attempted to edit returns


The fix is actually fairly simple… I went through my clomd.log (/var/log/clomd.log) on the esx hosts and something happened to the clomd service (saw some backtraces and what not) .. Not sure if it just because unstable from my habbit of constantly playing with it or it just became stale… At any rate

The solution:

Log into each host inside the cluster and issue the following command (ONE AT A TIME! AND GIVE IT SOME TIME BEFORE STARTING THE NEXT ONE!):

[john.deconti.somelaptop] ➤ ssh
X11 forwarding request failed on channel 0
The time and date of this login have been sent to the system logs.

VMware offers supported, powerful system administration tools.  Please
see for details.

The ESXi Shell can be disabled by an administrative user. See the
vSphere Security documentation for more information.
~ # /etc/init.d/clomd restart
watchdog-clomd: Terminating watchdog process with PID 34656391
Waiting for process to terminate...
clomd stopped
clomd started
~ #


Now go attempt to edit whatever it is you were editing and you should be good.


Nagios email end to end check (MS exchange to gmail and back)

I know I haven’t written in a while due to being bogged down by various projects however this is one I ran into recently that became an immediate requirement. This check is not mine nor am I taking credit for it but, I wanted to show how to use it as the website of the developer showed no clear way to use it in the manner I wanted to. You can obtain a copy of this check from the authors site here .. So… any way this is a package with various checks and can be used with other checks if the included ones do not fit your immediate needs. So I will start off from the top here first things first.

  • SSH to your nagios or centreon poller where this check is going to run.
  • If you don’t have access to the firewalls ask your network team to open up ports 25 and 993 to (or server of your choosing)
  • Change directories to your plugins directory in my case /usr/lib/nagios/plugins
[root@somehost /]# cd /usr/lib/nagios/plugins
[root@somehost plugins]#
  • Use wget to pull the file down from the website using the link above
[root@somehost plugins]# wget
--2015-06-24 08:09:06--
Connecting to||:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 85004 (83K) [application/x-tar]
Saving to: `check_email_delivery-0.7.1b.tar.gz'

100%[=======================================================================================================================================================>] 85,004       331K/s   in 0.3s

2015-06-24 08:09:06 (331 KB/s) - `check_email_delivery-0.7.1b.tar.gz' saved [85004/85004]
  •  Untar the archive we just pulled down


[root@somehost plugins]# tar xzvf check_email_delivery-0.7.1b.tar.gz
check_email_delivery-0.7.1b/docs/How to connect to IMAP server manually.txt
check_email_delivery-0.7.1b/docs/How to test plugin.txt
  • Now if you havn’t already head to and get yourself a gmail address or alternatively replace anything I mention with gmail with an smtp server of your choosing. Its also worth noting that you will need to turn on IMAP in your gmail account information can be found here for that .

Now that we have that account setup its time to play with our check… For this particular set of checks there is one check that ties everything together .. And that check is “check_email_delivery” . This plugin allows you to tie together the send and receive plugins by using the “-p” switch and then determines the length of time it takes from point A. to point B by the headers. You then give it the warning/critical parameters based on the amount of time it takes to make the loop from outside to back in… I had two different thought patterns on how to do this efficiently

A. To use this plugin in three separate checks to be able to more easily identify where the break specifically was (Outgoing , Incoming , Complete loop)

B. The second thought I had was to just check the complete loop and leave it to the Exchange guys to figure out where the break is when things go wrong.. I personally preferred option A because it would indicate more or less where the problem is exactly and allow for a quicker re-coup of a production environment but was told the loop is all we need… I will show examples of both in either case and let you make up your mind on how to piece it together.

  • Option A examples
###Send from Gmail TO your exchange environment###

./check_email_delivery -p './check_smtp_send -H --tls -U -P password --mailfrom --mailto --body "hello,world" --header Subject:" Nagios" ' -p './check_imap_receive -H -U -P password -s SUBJECT -s "Nagios" -w 30 -c 120'

###Send from Exchange out to your Gmail account###

./check_email_delivery -p './check_smtp_send -H  --mailfrom --mailto --body "hello,world" --header Subject:" Nagios" ' -p './check_imap_receive -H --tls -U -P password -s SUBJECT -s "Nagios" -w 30 -c 120'

##Just to show these actually work
[root@somehost plugins]# $./check_email_delivery -p './check_smtp_send -H --tls -U -P password--mailfrom --mailto --body "hello,world" --header Subject:" Nagios" ' -p './check_imap_receive -H -U -P password -s SUBJECT -s "Nagios" -w 30 -c 120'
EMAIL DELIVERY OK - 10 seconds, 0 delay | delay=0s;95;300;0 elapsed=10s

##How it looks when you set it up as a command in centreon (where $USER1$ is the location of my plugins folder) ##You could add more arguments to the check for user accounts and what not this is simply a basic functional configuration within my centreon(nagios) installation. ## 

$USER1$/check_email_delivery -p '$USER1$/check_smtp_send -H --tls -U -P password --mailfrom --mailto --body "hello,world" --header Subject:" Nagios" ' -p '$USER1$/check_imap_receive -H -U -P password -s SUBJECT -s "Nagios" -w $ARG1$ -c $ARG2$'


  • Option B Example.. For this one I use both checks on the inside to avoid firewall rules and what not… I just went to and setup a forward back to the internal account showing that the complete loop is functioning properly (send from inside, received on the outside and then returned to the inside.. The forward from the outside back in is not possible if our provider is having issues accepting incoming mail, and further more will fail if exchange has a long delay or a bounce back occurs.). Information on how to turn the forwarding on can be found here
##Checking loop completely inside making use of a forward setup in gmail

./check_email_delivery -p './check_smtp_send -H --mailfrom --mailto --body "hello,world" --header Subject:" Nagios" ' -p './check_imap_receive -H -U -P password -s SUBJECT -s "Nagios" -w 30 -c 120'

##How it looks in centreon as a command (Again where $USER1$ = location of my plugins.)

$USER1$/check_email_delivery -p '$USER1$/check_smtp_send -H --mailfrom --mailto --body "hello,world" --header Subject:" Nagios" ' -p '$USER1$/check_imap_receive -H -U -P password-s SUBJECT -s "Nagios" -w $ARG1$ -c $ARG2$'










CUPS: Seemingly random printers showing up

Yesterday I was handed an issue where a server was displaying seemingly random printers when you issued the lpstat -t command. Now not that this issue is hard to figure out but I like to think of cups as a set it and forget it type of service… Only touching it when you run into a broken pipe or need to add another printer…. Any way the issue was that about 30 printers were showing up on the application but there were only 3 printers actually configured by us:

 Printer configuration file for CUPS v1.3.7
# Written by cupsd on 2012-10-25 13:37
<Printer Printer1>
Info Printer1
DeviceURI socket://
State Idle
StateTime 1349988404
Accepting Yes
Shared Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
OpPolicy default
ErrorPolicy stop-printer
<Printer Printer2>
Info Zebra
DeviceURI socket://
State Idle
StateTime 1351005356
Accepting Yes
Shared Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
OpPolicy default
ErrorPolicy stop-printer
<Printer Printer3>
Info Zebra
DeviceURI socket://
State Idle
StateTime 1351177640
Accepting Yes
Shared Yes
JobSheets none none
QuotaPeriod 0
PageLimit 0
KLimit 0
OpPolicy default
ErrorPolicy stop-printer


This led me to alot of confusion so I took a trip on over to my /etc/printcap file …  Which I was surprised to see that all the printers that I didn’t want were listed in this file…  And to further my confusion, This is at the top of the print cap file(especially knowing what was actually in my printers.conf file):

# This file was automatically generated by cupsd(8) from the
# /etc/cups/printers.conf file.  All changes to this file
# will be lost.


Any way, the problem was as simple as the solution (just very confusing at first glance). Someone inadvertently  turned on browsing inside of the cupsd.conf file which made printers from one of our print servers on the same subnet visible to the machine.. The below is the section of the conf file that needs to be edited from:


# Show shared printers on the local network.
Browsing On
BrowseOrder allow,deny
# (Change '@LOCAL' to 'ALL' if using directed broadcasts from another subnet.)
BrowseAllow @LOCAL



# Show shared printers on the local network.
Browsing off
BrowseOrder allow,deny
# (Change '@LOCAL' to 'ALL' if using directed broadcasts from another subnet.)
BrowseAllow @LOCAL



Simply changing the “Browsing On” to “Browsing off” and then just reset cups and the unwanted printers will disappear:

[root@somehost~]# service cups restart
Stopping cups:                                             [  OK  ]
Starting cups:                                             [  OK  ]











Eg. $releasever is not a valid and current release or hasnt been released yet/

Yesterday I ran into a group of machines that were running into some funky error I had never heard of before… Appearantly its fairly common and there were some horror stories about people having to rebuild their rpmdb and breaking stuff even more… I wasn’t about to get into that nonsense with a GROUP of machines so I did some research and picked the error message apart piece by piece …  So firstly here is what the error message shows:

[root@somehost ~]# yum search ftp
Loaded plugins: refresh-packagekit, security
YumRepo Error: All mirror URLs are not using ftp, http[s] or file.
 Eg. $releasever is not a valid and current release or hasnt been released yet/ [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: base. Please verify its path and try again


Now right away, I disregarded the first line because I saw the puzzling (below):

Eg. $releasever is not a valid and current release or hasnt been released yet/

I checked the version of RHEL I was on and noticed it was an oracle server 6.6

[root@somehost ~]# cat /etc/oracle-release
Oracle Linux Server release 6.6


Which I know is getting older now.. Which led me over to my /etc/yum.conf file because I know there is a parameter in there that tells yum what OS it is..

[root@somehost ~]# cat /etc/yum.conf
#  This is the default, if you make this bigger yum won't see if the metadata
# is newer on the remote and so you'll "gain" the bandwidth of not having to
# download the new metadata and "pay" for it by yum not having correct
# information.
#  It is esp. important, to have correct metadata, for distributions like
# Fedora which don't keep old packages around. If you don't like this checking
# interupting your command line usage, it's much better to have something
# manually check the metadata once an hour (yum-updatesd will do this).
# metadata_expire=90m

# PUT YOUR REPOS HERE OR IN separate files named file.repo
# in /etc/yum.repos.d

Immediately I could see someone mistakenly either overwrote this file with a yum.conf for centos or inadvertently altered the “distroverpkg=” parameter and put centos-release in there. I changed this to the appropriate parameter of “oracle-release” so it now looks like:



Then did a yum clean all:

[root@somehost ~]# yum clean all
Loaded plugins: refresh-packagekit, security
Cleaning repos: base epel extras ol6_UEK_latest ol6_latest
Cleaning up Everything

Now I had a feeling there were probably all centos repo’s  on there as well so I went and checked (it was referencing a centos mirror in the error message)… My feelings were correct in that assumption.. I removed ALL repo’s that were not related to OL6 (i.e. Centos-Base, epel)

[root@somehost yum.repos.d]# rm CentOS-*
rm: remove regular file `CentOS-Base.repo'? y
rm: remove regular file `CentOS-Debuginfo.repo'? y
rm: remove regular file `CentOS-Media.repo'? y
rm: remove regular file `CentOS-Vault.repo'? y
[root@somehost yum.repos.d]# rm epel*
rm: remove regular file `epel.repo'? y
rm: remove regular file `epel-testing.repo'? y
[root@somehost yum.repos.d]#


So now the only repo I have that is in there is the public-yum-ol6

[root@somehost yum.repos.d]# ls
[root@somehost yum.repos.d]#

Now time to test

[root@somehost yum.repos.d]# yum search ftp
Loaded plugins: refresh-packagekit, security
================================================================= N/S Matched: ftp =================================================================
vsftpd.x86_64 : Very Secure Ftp Daemon
curl.x86_64 : A utility for getting files from remote servers (FTP, HTTP, and others)
wget.x86_64 : A utility for retrieving files using the HTTP or FTP protocols


As you can see someone must have run a script or did not fully understand what this machine was running for an OS and added parameters in the yum.conf to say that it was a centos server.. In conjunction with that they added repos that did not belong to this distribution which was the cause of the error messages that seemed odd to me. This article seems long but really its just 1. check your yum.conf specifically for the “distroverpkg=” parameter 2. clean your yum using yum clean all 3. remove any repo’s that don’t belong in your /etc/yum.repos.d/ folder. 4. Test











Hot adding a disk to centos 5 (“mkfs.ext3: invalid blocks count”)


I ran into this issue yesterday with adding a disk to a Centos 5 VM while it was still running… I kept getting errors that looked like this (Yes, yes, I know! spare me the “THAT OS IS 15 YRS OLD!” speech i’m aware and this OS has been running fine for that duration.. App teams will migrate when they need to) :

$ mke2fs -t ext3 /dev/sde
mke2fs 1.39 (29-May-2006)
mke2fs: invalid blocks count - /dev/sde


The process is fairly simple… We rescan scsi device to tell the OS that new disk is there like this:

[root@somehost /]# echo "- - -" > /sys/class/scsi_host/host0/scan


Then we do an “fdisk -l” to list all disks currently on the system. In my case I was looking for a new disk of 60 gigs that I had just added:

[root@somehost /]# fdisk -l
Disk /dev/sdb: 16.1 GB, 16106127360 bytes
255 heads, 63 sectors/track, 1958 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table

Disk /dev/sdc: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 10443 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdc doesn't contain a valid partition table

Disk /dev/sdd: 53.6 GB, 53687091200 bytes
255 heads, 63 sectors/track, 6527 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1        6527    52428096   8e  Linux LVM

Disk /dev/sde: 64.4 GB, 64424509440 bytes
255 heads, 63 sectors/track, 7832 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sde doesn't contain a valid partition table


Now that I can see which disk I just added from my fdisk,  I can create the filesystem on the device…I know there is a nit-picker out there that is going to say ” YOU DIDN’T ADD A PARTITION!!!!” this is intentional as I can now online resize the disk should it be needed in the future (scale-ability is key) as I describe here : on the fly resizes

[root@somehost /]# mkfs.ext3 /dev/sde
mke2fs 1.39 (29-May-2006)
/dev/sde is entire device, not just one partition!
Proceed anyway? (y,n)
[root@somehost /] #  y


And thats pretty much it after there you just need to mount it and add it into your /etc/fstab which if you are reading this I assume you are already familiar with.











Ansible basic usage and common issues encountered

I figured since the Ansible roles post got so many views that I would write for those who are just getting into this product. I wanted to go over some basic commands and some error messages that I encountered while learning how this software worked, And hopefully help you over come any issues. Again if you have problems shoot me an email (my email is on the About page) and I would be happy to help when I can.

Firstly lets start with some real basic stuff. If you have been on over to their website and watched their little training modules you know that there are a few different basic ways to use this product:

A. Raw module (using the ansible command)

B. Playbooks

C. Roles


I will start by explaining the Raw module and although its cumbersome why you may want/need to use this in the beginning.. Its basic functionality from what I can see is to just pass off shell commands to the nodes(hosts) using ssh (or shared keys if you have those setup). Now some of you may be asking “Well wait where does it get the user and password from?” … The answer to this is, it will use the user you are logged into (root in my case) and the password you have in your Ansible host file (the one in your working directory.)… What I did in my host file for a password is I parented all my linux servers and defined a password for them all.
For instance:

This allowed ansible to use my password foobar while running any ansible function whether it was a playbook/role/or just using the ansible command. Now one of the reasons I use this module often is because of error messages like this:

failed: [somehost] => {"failed": true, "parsed": false}
invalid output was: Error: ansible requires a json module, none found!

In a massive environment like the one I work in you may run into this due to old outdated operating systems (I.E. RHEL4/5) that don’t have the required python modules installed on it. So to fix an error message like that you would use the raw module like this:

ansible broken-hosts -m raw -a "yum -y install python-simplejson"

Assuming I had added a list of hosts that were failing to my hosts file in my working directory (/etc/ansible/ for me) and labeled them [broken-hosts]

Another error message that confused me to no end at first was :

fatal: [websrvr01] => {'msg': 'FAILED: [Errno -2] Name or service not known', 'failed': True}


If you encounter this go ahead and check the dns name of the host you threw in your host file. Typically from what I have found I made a typo in the host file or someone shut the machine off.


So again… Why would you use the raw command?

1. Resolve error messages related to dependencies

2. Running simple commands you DON’T want a playbook for i.e. “shutdown -h now” (mistakes were made xD)

The command syntax (and there are more than just the raw module) looks like this:

 ansible [host subset] -m raw -a "[shell command to run]"



Playbooks are a great way to automate simple tasks… Using the YAML wrapper they just make life easy if you have a simple task you need to take care of.. l personally use roles which uses the same command to execute its just broken down and a little more modular but for the sake of explanation I will show you how to setup playbooks and use them properly…. The first thing you want to do is make sure you have a nice and tidy place to organize any files your playbook may need in my example we will be use /ds1/ my datastore directory inside of this directory I would put two more directories /ds1/scripts/, /ds1/software/

In my example playbook we will be copying a script to oracle production servers and adding it to chkconfig (autostart for everyone who doesn’t know)

- hosts: oracle-prod
  remote_user: root
  - name: Copying dbora script
    copy: src=/ds1/scripts/dbora dest=/etc/init.d/dbora owner=root group=root
  - name: Updating chkconfig
    shell: chkconfig --add dbora
  - name: Updating chkconfig dbora
    shell: chkconfig --level 345 dbora on


You can probably see now why I said it was a nice YAML wrapper earlier … Everything is laid out nice and cleanly and its all labeled. One thing I  will tell you on this is to pay attention very closely to white space because your script wont run and it will give you error messages referencing space issues like this:

ERROR: Syntax Error while loading YAML script, dbora.yml
Note: The error may actually appear before this position: line 6, column 5

   - name: Copying dbora script
    copy: src=/ds1/scripts/dbora dest=/etc/init.d/dbora owner=root group=root

In the above error message you can see that I made a mistake by accidentally indenting the “- name:” portion of my playbook and ansible points that out to me… Alignment is key here. Lets go back and just take a look at how we have it laid out.. Starting out you see that I have a “- hosts:” section this will be where you define a group of hosts within you ansible host file.

The remote_user section is obvious for most… But there were some instances where we could not use root to ssh (PermitRootlogin=no in the sshd_config).. Which AGAIN ansible has an answer to… There are two ways to go about this.. You could do it on the command line in combination with your ansible-playbook command OR simply add it to your playbook and have it ask you for credentials like so:

- hosts:
  gather_facts: False
  su: yes
  su_user: root
    - shell: whoami


ansible-playbook --su --su-user=root --ask-su-pass playbook.yml


And it will prompt you for your password. Which gets you around that issue.

Now we will go into the tasks section… There is a lot you can do here but the basic structure of a playbook is to tell it  what servers it needs to run on , who to run it as, and tasks which consist of, a name that will be echoed in the execution of the playbook and then tell it what to do.. In my example above you see I’m telling it to copy: and then give it a source and destination. In other tasks I’m simply executing shell commands to complete my tasks list…. All good stuff right ? But how do you execute the playbook? Here is the basic command structure:

ansible-playbook /etc/ansible/playbooks/[playbook.yml]


Note that there ARE extra switches you can use… The most common one that I find myself using is the –forks because I want lots of processes running in parallel . By default this is set to 5 and the most I have ever attempted was 50 due to resource constraints but it made for a fairly quick run of a script on 1000 servers. or –retry @/root/playbook.retry (ansible keeps track of the hosts that the playbook failed on) so I can rerun my playbook on ONLY the hosts that failed later after I address their issues.



Please refer to my previous post here Ansible Roles Explained in Practice