Navigatie overslaan.

my systems

Audio: Weighted Playlists, exploitation vs. exploration

This is the fourth part of a series of articles that describe the way I run my computers. These articles are not intended as tutorials and do not contain the details to mirror my setup. They are written in one go, and most certainly contain errors. Feel free to ask for details if you are interested. I will try to update whenever it's appropriate. This article is about how I manage my audio files.

The Problem

Before we get to the technical part of this article I will have to describe my musical preferences.

  1. I like to explore new music; I constantly download new and unkown music.
  2. I like variation; listening to the same album three times in a row is nothing for me.
  3. I like albums; some songs need to be heard in their context to fully appreciate them
  4. 90% of everything is crap; most albums contain only one are two decent songs.

As a consequence of these four points I have a lot of music that I hardly know, and a lot of music that is rather bad. If I only listened to the music that I know to be good I would never hear anything new. If I played all songs randomly I would be listening to bad music most of the time. In Artificial Intelligence we would call it a problem of exploration versus exploitation.

Gathering Information

So I want a playlist that balances the quality of a song with when it was played last.
To do so this information needs to be recorded. The last play date is easy, my audio player (Amarok) automatically keeps track of that. The quality has to be determined manually. I rate my songs on a 1-10 scale (and 0 for unrated files). Rating takes a lot of time, it took me three years, and I'm still not completely done.

The solution

My solution to the playlist problem is to associate each level with a time period, which indicates how often I want to hear a song.
For example:

Rating Delay
10 2 months
9 3 months
8 5 months
7 8 months
6 12 months
5 22 months
2-4 26 months
1 never

This means that my favorite songs are added to the playlist two months after they have been played for the last time. Most songs are only played once every 22 months. Almost two years between plays seems like an awfully long time, but remember that these are the songs that I do not consider very good (but not bad either), or that I want to hear again before assigning a final rating. I believe that you should hear something at least twice before you can decide on it. There is a lot of music that I couldn't appreciate the first time I heard it, so when in doubt I don't delete.

Low Ratings

Rating 1 songs are never automatically queued. Its stuff that I don't want to hear, but don't want to delete either.
Rating 2-4 is lumped together in one group. Anything below rating 5 I consider bad music. I want to hear those songs once in a while in case my taste or opinion changes, but they shouldn't dominate my playlist. This playlist is limited to 150 songs.

Implemenation

Amarok 1.4 has everything that is needed. I use the Smart Playlist feature to automatically generate playlists that match the above specification.
Besides the categories above I also have a special playlist for new and unrated music that's also added to the mix.

Example

Right now my playlist looks like this:

Rating # Songs
10 39
9 50
8 68
7 88
6 265
5 489
2-4 150
new 321

Notes

Albums

I mentioned earlier that I like albums. Amarok has a nice "random album" feature, that randomly selects an album from the current playlists, and plays all songs from that album that are on the playlist. When used with the above set of playlists it ensures that songs from the same album are always played in order, but the good songs will be on the list more often than the bad tracks.

Size

I try to keep my playlist between 1000 and 1500 songs. A short playlist loads much faster, but a long playlist increases the probability that more than one song from the same album is on the list.

Storage: Filesystems & Backups

This is the third part of a series of articles that describe the way I run my computers. These articles are not intended as tutorials and do not contain the details to mirror my setup. Feel free to ask for details if you are interested. This article is about how I do backups and synchronization of my home directory.

E-mail: Centralized server

This is the second part of a series of articles that describe the way I run my computers. This one is about e-mail. These articles are not intended as tutorials and do not contain the details to mirror my setup. Feel free to ask for details if you are interested.

The goals

The goal of my mail system is to centralize all my e-mail in one location. I receive a lot of mail, on many different e-mail address. Centralizing is the only way to keep it manageable and pay proper attention to all mail.

I have a number of requirements:

  • Mail must be accessible with a normal mail client, webmail is too slow for me
  • Mail must be accessible over the web
  • Mail must be searchable
  • Mail must be filtered into folders
  • Spam must be removed

One important requirement I have for all my software is that I do not want to be locked into old, unmaintained or bad software. Therefore I try to use open standards and file formats as much as possible. Every bit of software mentioned on this page has at least one decent alternative. If needed I could replace every program without any loss of data or functionality.

Receiving mail

E-mail arrives at my system in two different ways. The first is by direct SMTP, and the second is trough fetchmail.
Fetchmail is a program that fetches mail from POP3 and IMAP servers. Upon retrieving an e-mail it is handed over to my SMTP server. There is one e-mail account that does not work (directly) with fetchmail, and that is my Hotmail address. For that I use a program called Hotway which is a http2pop3 proxy. It acts like a pop3 server, but in the background it logs into the Hotmail website and retrieves the e-mails over HTTP. It's butt-ugly, but it works.

Filtering spam

From now on all mail is handled as if it arrived directly over SMTP.
Let's follow a mail through the system.
The first thing to do is spam-checking. The longer you wait with that, the more time you waste on mail that will be deleted anyway. Spamassassin with the Bayesian classifier is my favorite anti-spam solution.

When I say that it's the first thing, I really mean it. My mail server does not acknowledge the reception of an e-mail until it has been scanned. If the system thinks it's spam it will not acknowledge the reception, but refuse it. The sending server will then have to bounce my error message back to the sender.

However, must spam has a falsified sender address. Sending it back will only increase the problem. Therefore, if the server is truly convinced that it is spam, it will _not_ refuse it. Instead it will accept it and delete it immediately. No need to burden the sending server any more with a mail that's spam anyway.

The other side of the story is mail that the system cannot decide on. These mails are put into a special mail folder that I manually check about once a week. It's been months since my last false positive, but I prefer err on the side of caution.

Bayesian classifier

Spamassassin includes a Bayesian classifier, that is trained on your own e-mail. It learns the differences between e-mails that you have designated 'spam' or 'ham' (not-spam). For this purpose I have two special mail folders named 'spam' and 'ham'. Any mail that I put into these folders is automatically added to the training set of the classifier. If I find a mail that is misclassified I put it into those folders.
(This way of training, called Train-On-Error, is not optimal. It will not learn changes in your behaviour until you find out that it has been making mistakes. Training on my entire mail archive is not feasible because it would take too much time. I intend to write something that uses a sample of recent mail.)

Sorting into mailboxes

Now that the mail has been accepted, it needs to be delivered into a local mail folder.
I use the Sieve mail filtering language. Sieve is a standarized language to define mail filters. There is a special protocol for (remotely) managing Sieve scripts. The advantage of Sieve is that all filtering is done on the server, but I can still write filters from within my mail client.

The filter itself is not very interesting, but one nice detail is that I use special purpose e-mail addresses to simplify filtering. For example if I would send you an e-mail, I would use the email address "casper-you@gielen.name" . That makes it very easy to filter your replies into a mailbox. A nice little extra is that it helps me to manage spam. If I would receive from casper-you@gielen.name, I would know that you are somehow responsible, as I did not use that address with anybody else. I now know not to trust you, and I can add that e-mail address to my blacklist without the risk of dropping any real mail. Now that I've posted that address on this page I should probably add it to the blacklist straight away.

Storing mail

All mail is stored in the Mailbox format. This format stores each message in a seperate file. Also useful is that the filename gives some information about the status of the mail, such as if it has been read or if it is flagged as important.
I use this to automatically clean up some mailboxes. I follow a bunch of mailinglists that are also archived on the internet, so there is no point in keeping my own copy of those mails for more than a few days. I wrote a script that deletes mail from those folders that has been read, is not flagged and has not been accessed for thirty days as important.

Another advantage is that all the standard Unix tools can be used to manage those files. It also avoids most problems associated with simultaneous connections to the same mailbox, as it is no longer necessary to lock entire mailboxes.

The biggest disadvantage is that you'll soon have many thousands of files and most filesystems have terrible performance on many small files. Fortunately it's no problem for Reiserfs.

Reading mail

Mail is served over IMAP through Dovecot. A small but very powerfull IMAP server. IMAP is really the only choice for this. POP3 is not suitable for leaving mail on the server, and whatever Exchange uses is only compatible with Outlook.
I read most of my mail through KMail, closely followed by Icedove (aka Mozilla Thunderbird). When logged in remotely I tend to fall back to mutt. mutt is still the most powerful mail client that I know off. If I need to do anything really fancy than mutt is usually the best choice.
I also run the Ilohamail webmail client. It's a very simple webmail client, but it is extremely fast. I use the webmail when I'm not allowed, or don't have the time, to log into my own system with SSH.

Miscellaneous

That's about it. I could tell a lot more, for example that everything is secured with SSL, but I think this article is long enough as it is. Good night.

Audio: 2.1 + headphones for the lazy

I've decided that I should document as much about my setup as possible. I'll start off easily with my audio setup.

My computer has a pair of normal speakers and a subwoofer. During the night I use a headphone. In the past this involved plugging the headphones into the amplifier to switch.
However the Creative SoundBlaster Audigy allows for a much more nifty setup. I suppose that other 5.1 capable sound cards are probably able to do the same.

DNSSEC made easy with zonesigner

I've just tested zonesigner from dnssec-tools.org. It was surprisingly easy. If you think that DNSSEC is a complex mess you should try zonesigner. It's pretty much as close to a turn-key solution for DNSSEC as possible. You don't really need to understand what's happening. Just follow the instructions and you'll be fine.

Adding DNSSEC to your domain is still not for everybody, but if you feel confident about administrating BIND, than DNSSEC should be within your reach as well.

Bluetooth headset

I was recently given a Jabra5020 Bluetooth headset (thanks David!).
It took quite a bit of work to get it to work, allthough in the end the solution is rather simple.

1. install the appropriate Bluez and Alsa packages (sudo aptitude install bluez-audio bluetooth-alsa on Debian).

2. Add

pcm.bluetooth { type bluetooth }

to /etc/asounc.conf OR ~/.asoundrc

3. Put the headset in connect mode (by holding the "on" button for a few seconds until the light becomes blue).

4. Direct audio to the alsa blueooth device, for example: mplayer -ao alsa:device=bluetooth *mpg

5. Enter 0000 in the popup that asks for a pin-code.

The tricky thing is that the headset only works once, and then the driver has to be restarted. ("modprobe -r sco; modprobe sco"). That's a bit annoying, but not unacceptable, as I don't see myelf using it all that often. As long as it works on the first try I'm farily happy with it.

Unfortunately the sound quality is rather poor. I have no idea how it compares to other bluetooth headsets, but it was far worse than any wired headphones.

There is an alternative protocol around called A2DP that is supposed to deliver superior quality, but I was unable to get it to work. I'm not even sure of the Jabra 5020 supports this protocol at all.

Inhoud syndiceren