Using Vim as a word processor

Recently I have been asked to share some of my content on a site called Ops School, a very cool site, that bills itself as “a comprehensive program that will help you learn to be an operations engineer”.  It is essentially an online guide covering topics geared towards a successful career in IT.  If you haven’t checked the site out already I highly suggest you go take a look!  Like right now.  Even better if you have something to contribute!  Either join the mailing list or get going by joining the community over on github.  Contributing to this project is a fantastic way to get your name on an Open Source project and would also be a great learning experience if that type of things is interesting to you.  At least it has been for me so far.

Anyway, the project has a set of guidelines and styles posted on their site for authors to adhere by.  Thus far I have found Vim to be the best word processor for following these styles and also the best way to submit writing to this project, plus it is a good way to force myself to make use of Vim because I don’t get much practice using it otherwise.

I have taken bits and pieces from various other vimrc’s I’ve found and fit them into my own unique scenario, which I suggest you do as well.  But the following section is a great example to use a starting point for adding in the word processor functionality to your vimrc.

func! WordProcessorMode()
  setlocal formatoptions=t1
  setlocal textwidth=80
  map j gj
  map k gk
  setlocal smartindent
  setlocal spell spelllang=en_us
  setlocal noexpandtab
endfu
com! WP call WordProcessorMode()

One gotcha that I encountered with this setup initially was that lines didn’t automatically re-balance for me if I went back to a previous paragraph and made a change that  caused a line to spill over the 80 character word wrap limit.  To do align paragraphs, select the text that has come out of line and type “gq” to balance out the text in the paragraph again.

If you have question let me know.  Otherwise, if you have any other tricks or tips that you like to use to enhance your Vim word processing experience feel free to let me know!

Read More

More tips for improvement

The previous post I wrote about becoming a better sysadmin, covering general points and tips for self improvement turned out to be more popular than I thought it would be (okay not really, but for me it was at least!).  So I decided to write a little bit of a follow up post in regards to general improvement and subsequently have decided to focus more of my time and effort on including more posts related to that type of content.  I find the topic of self improvement to be interesting and would love to write more about my findings, so I will be experimenting a little bit with these less technical pieces a little bit more I think. I would also like to hear what others have to say about these posts and the topic of self improvement so let me know.

While this post is primarily focused on self improvement this can easily be adopted to anybody in the IT industry that is just looking for a way to improve themselves and get better.  So while the hard skills (certifications, books, blogs, anything that specifically relates to a specific area) are incredibly important, why not throw in these general and well known strategies to help with your improvement?  I think the positive benefits will heavily outweigh the negatives in this scenario.

General Tips for improvement

These simple tips can go a really long way.  I read threads all the time about how to get better and how to improve mental capacity, blah blah blah and the following suggestions always seem to pop up.  What I have found to be true is that there is no magical instant way to improve yourself, I am learning that the hard way.  To me, the best way to see results and really work on yourself starts by changing your habits, working hard and being consistent.  That might not be what you are looking for, but trust me, these small tips can go a long way in becoming better at what you do.

Exercise – Time and again I hear and read about the massive benefits of proper exercise.  I did not take this advice seriously until just recently and can say that it has made a huge difference in the way I think and the way I feel.  I used to always feel beat down and terrible after work until I started exercising so those times of the day where you feel a lot slower are lessened.  I wouldn’t recommend starting out by completely changing the way you live your life.  Something simple to start with.  This summer I started running again, I made a routine out of going out for a run after work, and just kept at it until I started seeing changes.  I gradually increased the period of time and distance of my runs, then I added in weight lifting and other types of exercises gradually. By no means am I a hardcore athlete now but I do believe in the importance and benefits of exercising and working your body regularly to improve your mind.

Sleep – This is probably the most important thing to remember when you are trying to work on hacking your mind and improving yourself.  8 hours of sleep seems to be the general rule of thumb, and it should not be overlooked when you are evaluating yourself and your goals for getting to where you want to be.  If you want to wake up early, you need to go to sleep early, it really is as simple as that.  It is also important to be consistent on your sleep schedule so your body can get used to when it should slow down and when it should speed up (even on weekends!).  For example, getting in a routine of winding down at a certain time, say 9 pm  every night by reading a book for an hour to train your body that it is time to sleep.  Read until say 10 pm every night if you want to wake up at 6 am to get the sleep consistency your body needs, also giving your body enough time to repair and heal itself to get up and going.

Diet – Also important.  I realize that everybody is different and I don’t want to speak as if an authority on the subject so please take this advice at face value.  The point I want to make though is that diet isvery important in improvement.  Again, I do not want to encourage anybody to go all out and completely change every eating habit they have at once.  You will crash and burn like many others, it make work for some but generally you will be safer and more likely to make an impact if you take things slowly.  Work on one thing at a time and gradually make the changes to improve your diet and health.  As an example, start by cutting out something small, like cutting out a particular type of food that isn’t exactly healthy.  For me it was soda, and once I had that under control I was able to cut out (for the most part) fast food, etc.  Not entirely, I wouldn’t advocate that but cutting back is a good first step.  Basically doing something is better than doing nothing.

Read More

Document storage: Part 6

Document Storage Project

This is Part 6: Tying it all together.

All that’s left to do now is write a script that will:

  • Detect when a new file’s been uploaded.
  • Turn it into a searchable PDF with OCR.
  • Put the finished PDF in a suitable directory so we can easily browse for it later.

This is actually pretty easy. inotifywait(1) will tell us whenever a file’s been closed, we can use that as our trigger to OCR the document.

Our script is therefore in two parts:

Part 1: will watch the /home/incoming directory for any files that are closed.
Part 2: will be called by the script in part 1 every time a file is created.

Part 1

This script lives in /home/scripts and is called watch-dir.

#!/bin/bash
INCOMING="/home/incoming"
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

inotifywait -m --format '%:e %f' -e CLOSE_WRITE "${INCOMING}"  2>/dev/null | while read LINE
do
        FILE="${INCOMING}"/`echo ${LINE} | cut -d" " -f2-`
        "${DIR}"/process-image "${FILE}" &
done

Part 2

This script lives in /home/scripts and is called process-image.

#!/bin/bash

# Dead easy - at least in theory!
# Take a single argument - filename of the file to process. 
# Do all the necessary processing to make it a 
# searchable PDF.

OUTFILE="`basename "${1}"`"
TEMPFILE="`mktemp`"

if [ -s "${1}" ]
then
	# We use the first part of the filename as a classification.
	CLASSIFICATION=`echo ${OUTFILE} | cut -f1 -d"-"`
	OUTDIR="/home/http/documents/${CLASSIFICATION}/`date +%Y`/`date +%Y-%m`/`date +%Y-%m-%d`"

	if [ ! -d "${OUTDIR}" ]
	then
		mkdir -p "${OUTDIR}" || exit 1
	fi

	# We have to move our file to a temporary location right away because 
	# otherwise pdfsandwich uses the file's own location for 
	# temporary storage. Well and good - but the file's location is 
	# subject to an inotify that will call this script!

	mv "${1}" "${TEMPFILE}" || exit 1

	# Have we a colour or a mono image? Probably quicker to find out 
	# and process accordingly rather than treat everything as RGB.
	# We assume the first page is representative of everything
        COLOURDEPTH=`convert "${TEMPFILE}[0]" -verbose -identify /dev/null 2>/dev/null | grep "Depth:" | awk -F'[/-]' '{print $2}'`
	if [ "${COLOURDEPTH}" -gt 1 ]
	then
		SANDWICHOPTS="-rgb"
	fi
	pdfsandwich ${SANDWICHOPTS} -o "${OUTDIR}/${OUTFILE}" "${TEMPFILE}" > /dev/null 2>&1
	rm "${TEMPFILE}"
fi

There’s just one thing missing: pdfsandwich. This is actually something I found elsewhere on the web. It hasn’t made it into any of the major distro repositories as far as I can tell, but it’s easy enough to compile and install yourself. Find it here.

Run /home/scripts/watch-dir every time we boot – the easiest way to do this is to include a line in /etc/rc.local that calls it:

/home/scripts/watch-dir &

Get it started now (unless you were planning on rebooting):

nohup /home/scripts/watch-dir &

Now you should be able to scan in documents, they’ll be automatically OCR’d and made available on the internal website you set up in part 3.

Further enhancements are left to the reader; suggestions include:

  • Automatically notifying sphider-plus to reindex when a document is added. (You’ll need a newer version of sphider-plus to do this. Unfortunately there is a cost associated with this, but it’s pretty cheap. Get it from here).
  • There is a bug in pdfsandwich (actually, I think the bug is probably in tesseract or hocr2pdf, both of which are called by pdfsandwich): under certain circumstances which I haven’t been able to nail down, sometimes you’ll find that in the finished PDF one page of a multi-page document will only show the OCR’d layer, not the original document. Track down this bug, fix it and notify the maintainer of the appropriate package so that the upstream package can also be fixed.
  • This isn’t terribly good for bulk scanning – if you want to scan in 50 one-page documents, you have to scan them individually otherwise they’ll be treated as a single 50 page document. Edit the script so we can somehow communicate with it that certain documents should be split into their constituent pages and store the resulting PDFs in this way.
  • Like all OCR-based solutions, this won’t give you a perfect representation of the source text in the finished PDF. But I’m quite sure the accuracy can be improved, very likely without having to make significant changes to how this operates. Carry out some experiments to figure out optimum settings for accuracy and edit the scripts accordingly.

Read More

Document Storage: Part 5

Document Storage Project

This is Part 5: Uploading Scanned Images.

There’s two components to this part: configuring somewhere for the files to be uploaded to and setting up your MFD to upload to them. Most modern MFDs will upload to a CIFS share, which is what we’re going to use here. First thing’s first, we need to install Samba:

apt-get install samba

Now we need to set up Samba. We’ll have user-level security (it’ll be much easier to lock things down if we want to increase security at a later date, and besides share-level security went out with the Ark) and a single share called incoming. We also need a user for the MFD to log into Samba with; we’ll call this user “scanner”. We’ll also have a group called “scanner” so we can be a little more flexible over who can access this share should we wish.

Edit /etc/samba/smb.conf as follows:

......

# "security = user" is always a good idea. This will require a Unix account
# in this server for every user accessing the server. See
# /usr/share/doc/samba-doc/htmldocs/Samba3-HOWTO/ServerType.html
# in the samba-doc package for details.
   security = user

......

[incoming]
        path = /home/incoming
        guest ok = no
        browseable = no
        read only = no
        valid users = @scanner

Now, we need a new user for the MFD. Samba requires that users also have corresponding Unix accounts, so first we create a Unix account, then we set their Samba password. We also need to ensure the permissions on /home/incoming are correct – the folllowing commands deal with this:

  useradd scanner
  smbpasswd scanner
  chgrp scanner /home/incoming
  chmod g+rwx /home/incoming

Make sure you choose a password that is not only secure, but possible to type in on your MFD! Check this works by connecting to the following folder in Windows:

\\(hostname)\incoming

You’ll need to use the username/password for the scanner user you set up.

For the final part of this, you need to set up your MFD to scan to this directory.

I’ve chosen an Oki MB451 multifunction unit for a number of reasons:

  • It’s cheap.
  • It has a double-sided document feeder for scanning. More and more documents are being sent double-sided; it seems like a step back to have a document feeder that can’t deal with this.
  • It supports scanning directly to email and CIFS share without requiring extra software on the PC. (This is important; certainly a few years ago a lot of manufacturers claimed their products could do this but it wasn’t apparent until after you’d taken it out of the box that their product didn’t do any of it without additional software on your PC. Certain large photocopier-type units still have this restriction, though sometimes you can buy an optional bolt-on to overcome it. I prefer avoiding the need for extra bolt-ons because they’re usually extortionately priced and often difficult to source).
  • It has a nice big display. These units can be a pig to set up at the best of times; a large display often goes some way to alleviate this problem.
  • You can set up lots of profiles – preconfigured shortcuts that say “everything scanned under this profile should be stored under this name in this share accessed with this username and password; files should have this format”. Unfortunately you can’t nail a profile to say “everything scanned under this profile is double-sided” but you can’t have everything!
  • The printer supports Postscript, which means it’ll be pretty much guaranteed to work under any OS I can throw at it for a long time to come.

I won’t go into detail regarding MFD configuration – there’s simply too many on the market and they all vary. It’s enough to explain that I’ve set up a profile called “Correspondence” and I’ve pointed it at \\(hostname)\incoming.

With the profile I’ve set up, scanned documents will be stored under \\(hostname)\incoming\Correspondence-#####.pdf.

Test this all works by scanning a document and making sure it appears in the /home/incoming directory on your Linux box.

There’s only one thing left to do – tie all this together so incoming documents are automatically OCR’d, made available via Apache and OCR’d so they’re indexable in Sphider….

Read More

Becoming a better sysadmin

I typically don’t focus on philosophical topics or the more abstract subjects, but recently I have been reading  up on the topic of self improvement and wanted to take some time today to lay out and develop some of the key concepts and ideas that I have found to be helpful so far.  Hopefully some of these ideas can be used to help you improve as well in the world of system administration and other future career endeavors.

So this post is going to be more of a work in progress than anything else, since I really just wanted to get some of this stuff written down in order to clear it out of my head.  There are literally books that have been written on self improvement and learning strategies so my goal with this isn’t to get every single detail, I just want to hit the high points and how their application to system administration.  Here’s what I have so far, feel free to let me know what I’m missing or throw in anything else that might be particularly useful on this subject.

Explicit vs Tacit knowledge

Explicit knowledge can be defined as that gained from books or listening to a lecture.  Basically some form of reading or auditory resource.

Tacit knowledge can be defined as that gained from experience, action and practice.

I’d like to start off by making a distinction between different types of knowledge.  I believe that the practice of system administration relies heavily on both types and just one type of experience is not enough to be great in this field.  They work hand in hand.  So for example, reading a ton of books, while useful in its own right will not be nearly as effective as reading books and then applying the knowledge gained from hands on experience.  Likewise, if somebody never bothers to pickup a book and relies entirely on hands experiences they will not be as knowledgeable as someone who incorporates both types of knowledge.  Although I do feel that much more can be learned from hands on experience in the field of system administration than by books alone.

Types of learning

There has been a good deal of research done on this subject but for the purposes of this post I would like to boil this all down to what are considered the three primary or main styles of learning.  The reason I want to focus on these is that they seem to work hand in hand with explicit and tacit knowledge and can be described a bit more easily.  Each one of these different styles represents a different sort of idiom to the learning experience.  So here they are:

  • Visual – Learning by watching or reading.
  • Auditory – Learning by listening.
  • Kinesthetic – Learning from experience, hands on.

I would argue that employing a good variety of learning and study methods would be the most appropriate way to develop your skills as a sysadmin.  But even in my own experiences with learning styles I have realized that I tend to favor a kinesthetic learning approach, and I’m sure others have their own preferences as well.  Instead of saying that one is better than another, I would suggest employing all of these types.  Take a look at yourself and figure out how you learn best and then decide which method(s) are the most and least helpful and then decide how to make these styles work to your advantage.  For example, I feel that I am a weak reader.  While I know that reading is important I tend to spend the least amount of time doing just reading if at all possible.  Having a piece of reading material as a reference or as an introduction is great.  If I don’t quite understand things from reading the next step I like to take is internalizing things by listening to or watching.  Finally, once I get a good enough idea about a topic I like to quickly put things into my own experiences.  There is some quote about how experience sticks but I am too lazy to look it up.  Suffice it say, I tend to remember things much more concretely when I am able to experience them for myself.

Again, this is just in my own experience and everybody is different.  I just wanted to give a specific example of one way to utilize different styles of learning.  There are many other possibilities and this just happens to be the way I prefer to learn things.

Learning strategies

Now that we have that out of the way, I want to highlight some of the major tactics that I use when attempting to learn a new subject.  I definitely use some of these more than others but the point is that you should attempt to utilize as much as you can for your own benefit.  Here are some different strategies I came up with that help me greatly when I encounter new and difficult to understand information.  Many of these work together or in tandem so they may described more than once.

The Feynman technique – This is as close to the end all be all that there is when it comes to learning.  Everybody is probably familiar with this one, but I am guessing they are not familiar with the name.  This technique is used to explain or go through a topic as if you were teaching it to somebody else that was just learning about it for the first time.  This basically forces you to know what you’re talking about.  If you get stuck when trying to explain a particular concept or idea, make a note of what you are struggling with and research and relearn the material until you can confidently explain it.  You should be able to explain the subject simply, if your explanations are wordy or convoluted you probably don’t understand it as well as you think.

Reading – I usually like to get an introduction to a topic by reading up on (and bookmarking) what information I feel to be the most informed, whether it be official documentation, RFC’s, books, magazines, respected blogs and authors, etc.  As I mentioned before, I would consider myself a weak reader (something that I definitely need to improve on!) so I also like to take very brief notes when something I read seems like it would useful so I can try it out for myself.

Watching/Listening to others – After getting a good idea from reading about a subject I always like to reinforce this by either watching demonstrations, videos, listening to podcasts, lectures or anything else that will show me how to get a better idea of how to do something.  When I’m on a long drive for example is a great time to put on a podcast.  It kills time as well as improves knowledge at the cost of nothing.  Very efficient!  The same with videos and demonstrations, the only thing holding you back is the motivation.

Try things for yourself – Sometimes this can be the most difficult approach but definitely can also be the most rewarding, there is nothing better than learning things the hard way.  Try things out for yourself in a lab or anywhere that you can practice the concepts that you are attempting to learn and understand.

Take notes – This is important for your own understanding of how things work in a way that you can internalize.  I will take notes on simple things like commands I won’t remember, related topics and concepts or even just jotting down keywords quickly that to Google for later on.  This goes hand in hand with the reading technique described above, just jotting down very simple, brief notes can be really useful.

Communicate with others – There are plenty of resources out there for getting help and for communicating and discussing what you learn with others.  I would suggest looking a /r/sysadmin as a starting point.  IRC channels are another great place to ask questions and get help, there are channels for pretty much any subject you can think of out there.  There are good sysadmin related channels at irc.freenode.net, if you don’t already utilize IRC I highly suggest you take a look.

Come back later – Give your brain some time to start digesting some of the information and to take a step back and put the pieces together to begin creating a bigger picture.  I can’t count how many times I have been working on learning a new concept or subject and felt overwhelmed and stuck until I took a break, did something completely different or thought about something else entirely and came back to the subject later on with a fresh perspective.   Sometimes these difficult subjects just take time to fully understand so taking breaks and clearing your head can be very useful.

Sleep on it – Have you ever heard of the term before?  This may sound crazy but sometimes if there is a particular problem that I can’t solve I will often times think about it before I go to sleep.  I find that by blocking out all outside interference and noise I can much more easily think about it, come up with fresh perspectives and ideas and often times will wake up with an answer the next morning.  I think meditation is comparable to this but I know nothing about meditation (I hope to at some point!) so I have to use this method for the time being.

Break stuff – One of the best ways to incorporate a number of these techniques is to intentionally break stuff in your own setups.  Triple check to be sure that you aren’t breaking anything important first and then go ahead and give it a try.  By forcing yourself to fix things that are broken you develop a much deeper and more intimate relationship with the way things work, why they work the way that they do and how things get broken to begin with.  The great thing about using this method is that it is almost always useful for something in the future, whether it be the troubleshooting skills, the Googling skills or the specific knowledge in the particular area that needed to be fixed.

Practice, practice, practice – The more I read about becoming better at something the more I am convinced that you have to practice like an absolute maniac.  I think for system administration this can partially come from practical job experience but it also comes from dedicated study and lab time.  The hands on component is where most of your practice will come from and becoming better doesn’t just happen, it takes cultivation and time, just like with any other skill.  Stick with it and never stop learning and improving on your skills through practice and experience.

Read More