We’re only looking for a fairly simple interface to browse through documents. Apache already gives us that – you just need to enable a feature called “Indexes”. But the default indexing is pretty ugly; it’d be nice to make it look a little prettier and maybe add scope for expanding on functionality.
Initially I was going to design my own style, but it turns out someone’s already done that and he’s done a better job than I could ever hope to. So I took the style setup from Recursive Design and tweaked it slightly to fit in with what we’re doing here.
cd /home/http/assets
svn co http://recursive-design.com/svn/misc/apache/index-style
mv index-style/* .
rmdir index-style
The stylesheet really ought to be referenced with a specific path so it can always be found. Edit /home/http/assets/header.html and change the stylesheet reference thus:
Next up, we need to configure Apache. Ignore the instructions on the Recursive Design blog; things are slightly different here. Edit /etc/apache2/sites-available/default as follows:
ServerAdmin (YOUR EMAIL ADDRESS HERE)
DocumentRoot /home/http/documents
Options FollowSymLinks
AllowOverride None
AllowOverride None
Options Indexes
DirectoryIndex index.html index.php
IndexOptions FancyIndexing
IndexOptions VersionSort
IndexOptions HTMLTable
IndexOptions FoldersFirst
IndexOptions IconsAreLinks
IndexOptions IgnoreCase
IndexOptions SuppressDescription
IndexOptions SuppressHTMLPreamble
IndexOptions XHTML
IndexOptions IconWidth=16
IndexOptions IconHeight=16
IndexOptions NameWidth=*
IndexOrderDefault Descending Name
HeaderName /assets/header.html
ReadmeName /assets/footer.html
Order allow,deny
Allow from all
Alias /assets /home/http/assets
Alias /search /home/http/search
AllowOverride None
Order allow,deny
Allow from all
AllowOverride None
Order allow,deny
Allow from all
ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
..... LEAVE THE REST OF THE FILE ALONE ......
Restart Apache.
Put something – ideally a PDF that was NOT generated from a scan but instead contains searchable text – into /home/http/documents and browse to http://(hostname).local from a separate PC on the network. If all goes according to plan, you should see something a bit like this:
There’s a lot more to do: we still need something that can index this little lot (so we can just punch in search terms) and we need some easy way to get documents onto the server. But they’re a topic for a future post…
I’m assuming that readers are already reasonably familiar with Linux and can generally find their way around OK. If I didn’t assume that, this set of instructions would probably wind up becoming a book!
I’m keeping it simple here by installing this on a spare PC I have hanging around. Things would be a little more complicated if this was on a shared host or a virtual server in a datacentre, but that’s beyond the scope of this project.
Install a base Debian Wheezy system. At the time of writing this is the “Testing” branch, which I wouldn’t ever deploy to a client. But this project is for me personally so I’m rather less bothered. You don’t need any extra software, so untick as much as you can.
Give the bulk of the disk space over to /home; keep 15-20 GB left over for /var.
A Linux box running Apache – and we shouldn’t even need DNS if we’re on the same subnet. Check it works by typing http://(hostname).local into your web browser.
Directories for our scripts, our static HTML, the document repository, incoming files for OCR’ing and scripts to carry out the OCR work.
Most of the software we’re going to need. There’s one or two things missing, but they’re so trivial that it’s hardly worth losing any sleep over them.
We still need:
To configure Apache to act as our file browser.
To integrate search functionality.
To sort out the scripts that are going to OCR incoming files.
If you have been a follower this blog, I wrote a post awhile back that described my preferred settings in tmux and just recently wrote a post about getting set up with bitlbee. Today we will be adding on another piece to what I will be calling my ultimate command line theme by introducing another useful command line tool for communications called Irssi. Here are the back posts to these if you missed them.
Now that we have that out of the way let’s talk about Irssi. Irssi is a console based IRC client that has been around for quite a while now. There is somewhat of a debate holy war as to which console based IRC client is the best. There are a number of hardcore Irssi users around that tout it is as the best, with the likewise being true for Weechat fans. Before going any further I will say that there is definitely a certain amount of leg work to get Irssi up and running with the full set of customizations and features. That said, I believe the extra work is worth every minute of time and effort if you are looking for a fully featured, rich IRC experience.
I want to present both of these clients (Irssi and Weechat) to readers and let each person decide for themselves which is the best, because saying one is better than the other wouldn’t be a fair comparison, and is really like comparing apples and orages. With that said, in a future post I will be going over the basics of using Weechat, the other touted console based IRC client.
Here we will assume that you have created and set up a user with irc.freenode.net. Once that step has been completed you should be able to follow these instructions without any issues.
You may have to shutdown and restart Irssi at this point for it to recognize the network name “freenode” in the next step.
/CHANNEL ADD -auto ##/r/sysadmin freenode
Adding advanced_windowlist.pl
First we need to download the script and put it into the appropriate place. If you haven’t created your Irssi scripts directory and your autorun directory go ahead and make them quickly.
Change directories to your scripts directory and download the script.
cd ~/.irssi/scripts
wget http://anti.teamidiot.de/static/nei/*/Code/Irssi/adv_windowlist.pl
Let’s quickly set it to be executable.
sudo chmod +x adv_windowlist.pl
Now we need to symlink this script and then run it in Irssi. To symlink it run the following,
cd ~/.irssi/scripts/autorun
ln -s ../adv_windowlist.pl
And finally to load it into Irrsi.
/run adv_windowlist.pl
That should be it. This can come in handy when you have any more than a handful of windows and can’t keep your conversations straight. If we take a look at our Irssi session we can see that there is a name associated with each window number now.
As you can see there is now a name associated with each of the windows that we have open. This looks pretty good but there are some cool features in this script that we are going to leverage to make it look even better. In your Irssi session run the following commands to customize your display even further,
OK, this is looking better. We now have our current conversation underlined, our windows named and numbered with decent formatting and have set windows with activity to update and change colors. There are more options if you look at the script itself but this is a pretty good start.
Setting up hilight.pl
This script will add in the ability to check messages that contain your nick. This is a good way to easily check messages while you were away or didn’t get a chance to respond to. First we need to make sure that the new window will split correctly.
SET autostick_split_windows ON
/hilight <nickname>
Now we add in and configure our new notification window.
/window new split
/window name hilight
/window size 4
Set up nm.pl
The description from its creator is “right aligned nicks depending on longest nick”. This script will help with the readability and organization of your different chats. I’m not sure if it requires nickcolor.pl but I have it in my scripts folder and symlinked to my autorun folder just in case. Just load nm.pl in like you do for all other scripts and it will start doing its things.
/run nm.pl
Setting up themes
Definitely not a necessity but can help to make things cleaner and easier to read. So far I have played around with the xchat and fear2 themes but will come back and update this post if I happen to find a better theme. The good thing is that thems are really easy to set up and use. So to load a specific theme just copy it into your /.irssi directory and turn it on in Irssi.
That’s all I have on Irssi for now. If there is one complaint that I have about Irssi it is that the nicklist.pl script doesn’t play nicely in tmux (however it should be fine in screen). It is a manual process and is a pain in the ass to get set up so I have chosen not to cover it in this post. It is possible I know, but for me, it just wasn’t worth the trouble. If you know of an easy way to get this working inside of tmux let me know.
Bitlbee is a way to bridge IM and IRC together essentially allowing you to connect to your IM network through an IRC interface. One great feature of Bitlbee is that it supports a large number of different protocols (including Gtalk, Yahoo!, Facebook and Twitter), which happen to be nearly all the major platforms I’m concerned with, excluding Microsoft Lync. The main reason I want to discuss Bitlbee now, ahead of time, is because I will be doing a series of posts that specifically tie Bitlbee in with a few of the more popular IRC clients.
As you will see, there are slight differences in how Bitlbee behaves inside each of the IRC clients I have been trying out, I will leave these details out for now to make things easier to follow. Today’s post will be more guided towards general use of Bitlbee, so I will be going over things like how to get around and its basic usage.
As usual, I will be running in Ubuntu so these instructions are specific to Debian based distros. Outside of installation, I image the usage will be very similar in other distributions because most of the commands and configuration are happening inside Bitblee.
Getting used to Bitlbee
Let’s start off by getting Bitlbee installed.
sudo aptitude install bitlbee
Now let’s go ahead and add in our gtalk (jabber) account.
Optional – turn on oauth (Still having some issues with this one).
account gtalk set oauth true
oauth = 'true'
Log in to Gtalk.
account jabber on
Start a chat in a new window.
/msg NickName Hello!
Getting a listing of various IM accounts.
account list
account list online
account list all
Managing various IM contacts, pretty self explanatory. Here 0 is the gtalk account we added earlier, [email protected] is the person we are adding to our account and nickname is how they will show up in our contact list.
I’ve recently identified a need to setup a means of indexing and browsing documents. This is a great little project to demonstrate how we can tie lots of tools together with Linux, so I’m going to write up what I’m doing as I go along.
The needs are pretty simple, but I haven’t found anything out of the box that suits. In particular, I’ve got the following needs:
This is only for a couple of people, it doesn’t need the complexity associated with a full-blown document management system.
It does need to index PDFs – including PDFs that have been generated from scanned in pages rather than from an office suite.
Scanned PDFs may not have had any sort of OCR process applied to them – but that doesn’t mean I don’t want to be able to search for them!
It needs to be able to do this with minimal interaction – as a rule of thumb, if it’s even conceivably possible to automate part of the process, that part of the process must be automated. I can think of better things to do with my time than click “Next…”
Must be able to interact with the system via a web browser.
Anything running on the public Internet is out – most of the information I’m scanning in has no business being anywhere near the public Internet.
Must be dead easy to backup. Anything that involves databases, Tomcat etc. is probably far too complicated.
Budget: About £250+VAT for a MFD that has a duplexing scanner unit. Other than that: £0. Most multifunction devices come with software that will OCR scanned files and index them, but further investigation suggests it usually fails the web-based and the “minimal interaction” requirement. I have a spare computer I can use sitting around, but I can’t justify a fortune on software. That may change in the future, but it’s what we’ve got now.
So, here’s the question: Can I do it? Read on…. (more…)