Project: Document Storage Made Simple

I’ve recently identified a need to setup a means of indexing and browsing documents. This is a great little project to demonstrate how we can tie lots of tools together with Linux, so I’m going to write up what I’m doing as I go along.

The needs are pretty simple, but I haven’t found anything out of the box that suits. In particular, I’ve got the following needs:

  • This is only for a couple of people, it doesn’t need the complexity associated with a full-blown document management system.
  • It does need to index PDFs – including PDFs that have been generated from scanned in pages rather than from an office suite.
  • Scanned PDFs may not have had any sort of OCR process applied to them – but that doesn’t mean I don’t want to be able to search for them!
  • It needs to be able to do this with minimal interaction – as a rule of thumb, if it’s even conceivably possible to automate part of the process, that part of the process must be automated. I can think of better things to do with my time than click “Next…”
  • Must be able to interact with the system via a web browser.
  • Anything running on the public Internet is out – most of the information I’m scanning in has no business being anywhere near the public Internet.
  • Must be dead easy to backup. Anything that involves databases, Tomcat etc. is probably far too complicated.
  • Budget: About £250+VAT for a MFD that has a duplexing scanner unit. Other than that: £0. Most multifunction devices come with software that will OCR scanned files and index them, but further investigation suggests it usually fails the web-based and the “minimal interaction” requirement. I have a spare computer I can use sitting around, but I can’t justify a fortune on software. That may change in the future, but it’s what we’ve got now.

So, here’s the question: Can I do it? Read on….
(more…)

Read More

Protip November: Quickly check installed packages in Debian

If you ever need to know if a specific package or a specific piece of software has been installed on a Debian system there are a variety of ways to tell, as you are probably aware of. One way that recently caught my eye, which is fast and easy to check from the command line is the search feature of aptitude.

Now, without speaking on the merits of using aptitude for package management (this is an entirely different topic) I would just like to mention here that it (aptitude) needs to be installed for this method to work. If you don’t have it installed already, it’s super simple.

sudo apt-get install aptitude

Okay, with that piece in place we just need to check that the package we are looking for is installed.  So to do that, enter the following command.

aptitude search '~i' | less

Basically, this command will output a list of installed packages.  I just added the | less to easily be able to scroll through the packages.  In my opinion, the flexibility and usefulness of the search feature on its own is  probably enough to start using aptitude, although this can be done similarly with apt-get. You may notice an “A” next to a number of these installed packages.  This indicates that a package was automatically installed.

If you take a look at the output from this command it will look similar to the example posted below.

As I said, this is a quick and dirty way to view installed packages, but there are definitely other ways to go about this.  I was unaware of the search functionality and well, I like aptitude so I thought that I would share the knowledge.  Let me know if you know of any other cool and/or easy ways to check for installed packages.

Read More

The power of “Why?”

I’m going go steer away from the very technical “how-to” type things I’ve written in the past and instead give a little bit of job advice to anyone who finds themselves in a technical role for the first time.

Sooner or later, we all have to deal with technical support-type questions.

It’s very tempting in these cases to take everything you’re told at face value and ask simple yes/no questions for more detail. On the face of it, this makes some sense – they can be easy to understand, quick to answer and get you to the root cause very quickly.

I would argue that they’re terrible questions. Yes, sometimes you get useful answers, but as often as not you get:

  • Answers that are downright wrong. Maybe the customer misunderstood the question, maybe they didn’t understand it at all but were afraid to admit ignorance. 
  • Answers that aren’t wrong, but aren’t terribly helpful.  Example: “No, I haven’t seen any error messages” (but considering my computer hasn’t actually got as far as logging me in that shouldn’t be terribly surprising).
  • Drawn into an argument. Example: “I’ve already told you what the problem is, now are you going to fix it?!”
Instead, try “Why?”. “Why do you think you’ve got a virus?” “Why are you having trouble with the website?”. It forces your customer to elaborate and drastically reduces the risk of confrontation.

 

Read More

Make MacVim’s mvim script use tabs and play nice with the command line

tl;dr: Replace the mvim script with this modified version: https://gist.github.com/3780676

MacVim comes with a really sweet script called mvim, which lets you launch MacVim and edit files from the command line. Unfortunately, this script is a little weak in a few ways:

  • It doesn’t let you edit multiple files.
  • It doesn’t let you pass in command line options.
  • It doesn’t let you use new tabs for opening new files into an existing window.
  • It doesn’t let you pipe stdin into vim for viewing (great with diffs).

All those things are awesome, so let’s make the mvim script better! How do we do that?

Well, first we add some extra command line options parsing to detect if we’re in diff mode, if we’re using stdin, and to preserve options for passing back into MacVim later. At Line 60 we add the following:

# Add new flags for different modes
stdin=false
diffmode=false
# Preserve command line options lazily
while [ -n "$1" ]; do
  case $1 in 
  -d) diffmode=true; shift;;
  -?*) opts="$opts $1"; shift;;
  -) stdin=true; break;;
  *) echo "*"; break;; 
  esac
done

This is a pretty normal bash argument getting loop. We look for -d (diff mode) and – (stdin) separate from other arguments. We also need to modify the command that starts MacVim to handle our different modes, etc. So we replace that command (originally on line 69):

# Last step:  fire up vim.
# The program should fork by default when started in GUI mode, but it does
# not; we work around this when this script is invoked as "gvim" or "rgview"
# etc., but not when it is invoked as "vim -g".
if [ "$gui" ]; then
	# Note: this isn't perfect, because any error output goes to the
	# terminal instead of the console log.
	# But if you use open instead, you will need to fully qualify the
	# path names for any filenames you specify, which is hard.
	exec "$binary" -g $opts ${1:+"$@"}
else
	exec "$binary" $opts ${1:+"$@"}
fi

With this better command:

# Last step:  fire up vim.
# The program should fork by default when started in GUI mode, but it does
# not; we work around this when this script is invoked as "gvim" or "rgview"
# etc., but not when it is invoked as "vim -g".
if [ "$gui" ]; then
  # Note: this isn't perfect, because any error output goes to the
  # terminal instead of the console log.
  # But if you use open instead, you will need to fully qualify the
  # path names for any filenames you specify, which is hard.

  # Handle stdin
  if $stdin; then
    exec "$binary" -g $opts -
  elif $diffmode; then
    exec "$binary" -f -g -d $opts $*
  elif $tabs && [[ `$binary --serverlist` = "VIM" ]]; then
    #make macvim open stuff in the same window instead of new ones
    exec "$binary" -g $opts --remote-tab-silent ${1:+"$@"} & 
    wait
  else
    exec "$binary" -g $opts ${1:+"$@"}
  fi
else
  exec "$binary" $opts ${1:+"$@"}
fi

The first two branches are pretty clear – they just invoke the MacVim binary in the correct way, for our different modes. The third one uses the very awesome –remote-tab-silent option, which gives us the ability to reuse the same window with new tabs when we edit multiple files. Neato!

Finally, if you don’t want to do the modifications yourself, it’s available as a gist, so you can download it and use it as a drop-in replacement for the vanilla mvim script: https://gist.github.com/3780676

Read More

Restore an Exchange Mailbox Database using Data Protector

Forgive the boring title for this post but I do think that this is a really important topic and one that I had to deal with recently at work.  Somehow one of our Exchange mailbox databases became corrupted and one of our users lost a ton of email, which, I’m almost 100% sure was related to the outage catastrophe we experienced 1.5 weeks ago.  This event made me thank the Flying Spaghetti Monster that I was getting good backups from our (sometimes shaky) backup solution, Data Protector.  Anyway, for this topic I will just assume that you are getting backups from whatever backup solution but it isn’t all that important because the majority of this post will cover specific instructions for the procedure within Exchange, so you can take bits and pieces and apply them where you need to.

Before I go any further, it is always worth mentioning;  make sure you are getting good backups! 

Ok, now that we have that out of the way I will show you the basic restore procedure within the Data Protector environment.  Select the Restore option from the drop down list -> MS Exchange 2010 Server.

Then select the source to backup up (Whichever database that needs to be restored).  With in Data Protector specify the restore options that you would like.

These are the options I used most recently.

  • Restore method: Restores files to temporary location
  • Backup version: Whichever data you decide you need to roll back to
  • Restore chain: Restore only this backup
  • Target client: Select the mailbox server that you want to restore to
  • Restore into location: This can be any location, just make sure there is enough disk space.
  • Select Restore databse file only

Once you have chosen your restore options, click the restore button to begin the restore procedure.

Once the database has been restored with Data Protector

Now for the fun stuff.  This is the part that I’m guessing most will probably be concerned with, but I didn’t want to leave out my Data Protector peeps.  Open up an Exchange Management Shell on the mailbox server that you restored your database to.  Technically it can be from any server as long as you connect to the correct mailbox server I guess.  Anyway, rename your restored database to something like “recoverydb.edb”.  Change directories into the restore folder, then check the status of the newly restored database with the following command:

eseutil /mh recoverydb.edb

You should see something similar to the following:

If it shows Clean Shutdown you can skip ahead.  Since we didn’t bring any log files down with us in this restore we will need to run the database hard repair on this database using the following command:

eseutil /p recoverydb.edb

After running the repair you should get a clean shutdown state if you check again (eseutil /mh).

Now we need to create a recovery database for Exchange to use in order to recover this data from.

New-MailboxDatabase -Recovery -Name “recoverydb” -Server Mailbox1 -EdbFilePath “M:\recovery\recoverydb.edb” -LogFolderPath “M:\recovery” -Verbose

It is important that when you create your recovery database it matches the renamed .edb file.  So since I renamed my recovery database to recoverydb.edb, I used recoverydb in the Powershell command.  If you want to check to make sure this step was done properly, use the following command to verify that the database is roughly the size you are expecting it to be:

Get-MailboxDatabase -status | select Servername,Name,DatabaseSize

After everything looks good we mount our database.

Mount-Database recoverydb

Just to verify that the database has stuff in it and we can find the person we’re looking for, we will take a quick look at the database contents, as shown below.

Get-MailboxStatistics -Database recoverydb

It looks like there are users there so all we need to do now is dump their emails into a temporary/recovery account in Exchange with the following command:

Restore-Mailbox -RecoveryMailbox “user_to_recover” -Identity “temporary_account” -RecoveryDatabase recoverydb -TargetFolder “RecoveredItems”

  • -RecoveryMailbox is the user mailbox that we are pulling data from, the source mailbox
  • -Identity is the user mailbox that we are putting data into, the destination mailbox
  • -RecoveryDatabase is our newly created recoverydb
  • -TargetFolder is the a folder that we will create on the target user to house the recovered items
  • -Verbose optional debugging information if there is a problem anywhere in the process

The wording and syntax of this command is a little bit tricky.  Just remember that the -RecoveryMailbox signifies the backup location and the -Identity signifies the restore location.  After this process completes (could take awhile depending on the mailbox size) you should be able to log in to the temporary account and take a look at the newly created “RecoveredItems” folder in which the mailbox contents of the user mailbox we are restoring have been copied in to.

Once this is done, just right click the target mailbox (temporary_account), click Manage Full Access Permission and give the restore mailbox (user_to_recover) full permissions through the Exchange Console so you can copy over messages, etc. in Outlook.

This step can be done any number of different ways but I chose this method because I was more concerned about the safest way to do this.  You could, for example, copy the contents directly into the user mailbox if you wanted.  Another option would be to export the contents of the temporary user out into a .pst file, with something like the following:

New-MailboxExportRequest –Mailbox mailboxserver –FilePath \recovery.pst

That should be it, after you are done and the emails have all been recovered , be sure to dismount the recovery database and delete the files to free space back up on your mailbox server.

Resources:

http://blogs.perficient.com/microsoft/2011/02/working-with-exchange-2010-recovery-databases-2/
http://www.mikepfeiffer.net/2011/07/restoring-mailbox-data-from-a-recovery-database
http://pmirmand.files.wordpress.com/2011/08/restore-database-on-exchange-2010.docx

Read More