Quickly installing Sun/Oracle Java in Linux Mint 15

Almost the same technique as yesterday, but a much bigger timesaver this time.  Most Linux distributions come with the open OpenJDK installed. This is fine for most things, but I’ve noticed that apps that are graphically complex (PyCharm for one) have some rendering issues and CPU usage is high.

You can install the Sun/Oracle Java instead, but this seems to be a pain to do from the download. There is another PPA for this:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer

It still downloads all however-many-megabytes of installer, but it’s fire and forget. No need to un-install OpenJDK, they can coexist.

Quickly installing Sublime Text 2 in Linux Mint 15

Sublime Text is my new favourite text editor. It has some really awesome functionality and a clean, modern interface.

Sometimes you just can’t be bothered using tar, putting the binary in the right place, working out how to get it onto the menu. I downloaded the Sublime Text 2 tar.bz2, and realised I had to do this manually.

Ubuntu and derivatives like Mint allow you to use PPAs – software repositories that aren’t included by default but can easily be added so that you install binaries easily. The most widely known one is for installing Java, but there is also one for installing Sublime Text 2.

You do this as follows:

sudo add-apt-repository ppa:webupd8team/sublime-text-2
sudo apt-get update
sudo apt-get install sublime-text

Job done – on the menu, with a short command line alias “subl” to use as well.

Don’t lose your data…

A bit of a different subject to normal posts. I’ve seen a lot of tweets recently from people who have lost irreplaceable data because they haven’t got a backup or their backups weren’t working properly.

Bruce Schneier recently said on his blog:

Remember the rule: no one ever wants backups, but everyone always wants restores.

This is the truth – it isn’t the backup that matters, it is the restore. You need to test it. If you are serious about your data, back it up!

I had a scare last year when my laptop’s SSD failed without warning, and then I found out my backups hadn’t been working properly. Luckily my elite data recovery skills meant I could get the data back.

I took this as a chance to implement a robust, dependable backup system that I knew I could rely on.

Goals

You need to decide what you are protecting

  • Photos – these to me are genuinely irreplaceable.
  • Projects – code, notes, datasheets, data etc. I could redo these, but it would take time and effort.
  • Emails – again, I would have no way of recreating these

And what you aren’t:

  • Media – TV, films, music. I’m not bothered about these – I can get them again
  • Programs and OS – I can download these again.

At this point I should say that I am not a fan of “bare metal restore” or full disk imaging. Why?

  • Individual files are not easily accessible – it is far harder to determine if things are working correctly.
  • The file formats are often proprietary and undocumented – if it isn’t working, I am going to have a hard time fixing that.
  • Bare metal restores are difficult onto different hardware – they don’t handle changes well, even a different sized partition complicates this.
  • I would hope I need to restore infrequently enough that re-installing my OS and programs is a welcome clean-out rather than inconvenience.

You need to decide what you are protecting against:

  • Disk failure – this seems to be the biggest threat to my data. One external HD and two mSATA SSDs have failed in the past two years. My view is now that no single storage device can be trusted, especially SSDs
  • Theft – my laptop, iPad, server or backup drives could be stolen.
  • Idiocy and mistakes – I could delete something I didn’t mean to at any point in time. Or simply change something I didn’t mean to.

It would be fair to say, my solution is belt and braces and then some.

Central storage

Instead of trusting my data to my individual mobile devices and backing those up, the primary store of data is on a central server located in our house.

This is a HP N40L server (which are often available for £100 with a cashback offer), running with 2x3TB drives in a RAID1 configuration. RAID1 is otherwise known as “mirroring” and I have implemented it in software (which means that I can put the drives in any machine, unlike with hardware RAID where the chipset must be the same). All RAID1 does is protect against drive failure – nothing else. If the machine is stolen, I lose my data. If I delete my data, I lose my data. Don’t fall into the trap that many do and call RAID1 a backup. I have done it for convenience and because these large drives are currently unproven in terms of reliability.

Although this is the primary store of data, I need to be able to work with this data quickly and when away from the house. Therefore everything is synced between the server and mobile devices periodically.

For Windows machines, I use SyncBack Pro to do this in near-realtime. It’s very effective and bi-directional.

Central storage backup

I run two of my own backups on the central storage.

Firstly, on a daily basis, an incremental backup is performed between the 2x3TB RAID1 array and an external 4TB USB drive. The incremental backup means I have 90 days of history on all of my files available immediately. The external USB drive means that there is a degree of isolation between the server and drive, and I can quickly remove it from the house if need be.

Secondly, at the beginning of each month, I plug in a second external 4TB USB drive. Again, this is an incremental backup, but less frequent. I then remove the drive and store it in my substantial safe. This protects me against hardware failure – even if the server decides to send 240V into all connected devices, this drive is not connected to the machine all of the time. It also protects me from theft and fire to a degree – only a determined burglar could open the safe.

Both of these use SyncBack Pro as well.

Offsite central storage backup

The entire central server is then backed up to the cloud using Crashplan. The most important feature of Crashplan is that it is offsite. Whatever happens to the hardware in the house, Crashplan will have the data.

Crashplan also allows friends and families to backup to my server and take advantage of all the other backups I perform.

Once a year I backup photos to a portable USB hard drive and give this to a trusted third party (parents) to look after.

Offsite laptop backup

Not content with that, I run Backblaze on my personal laptop. Backblaze is a competitor to Crashplan. This backs up everything on the laptop to the cloud.

(I’m not actually quite this paranoid – I used to use Backblaze on our old “server” running Windows 7. When I upgraded to the HP N40L, I found Backblaze doesn’t run on Windows server OS, so had to switch to Crashplan. I have another 18 months of Backblaze subscription left to use).

Dropbox and Github

The final aspect of backup is for all of my project work. All of it is on Dropbox. This isn’t primarily for backup – it is for access from wherever I want. All of my code goes onto Github.

Encryption

A number of the devices mentioned above are encrypted using Truecrypt. A number of more sensitive documents are encrypted before being sent to the cloud.

Testing

I regularly check the above is all working. I recently had an SSD failure, and initially noticed that 1 of the above mechanisms wasn’t working. It was quickly fixed.

Conclusion

This might be paranoid, but all this data is vital to me.

My photos, at the moment are stored:

  1. On my laptop
  2. On the RAID array in the server
  3. On the permanently connected USB drive
  4. On the once-a-month USB drive
  5. On the offsite portable USB drive
  6. On Crashplan
  7. On Backblaze

The chance of all of this going wrong at the same time is virtually zero.

We need an antidote to the anti-code

In the last post, I briefly went over the process of reverse engineering the algorithm behind an anti-code generator for an alarm system.

It turned out that the algorithm was very simple indeed. For a given 5-digit numeric quote code, we can derive a 5-digit reset code using a “secret” 8-bit (256 possibilities) version number as a key. This has a lot in common with a keyed hash function or a message authentication code.

There are some pretty serious security implications with this mechanism.

5 digit numeric codes are never going to be strong

Even if I had to enter a pin at random, a 5-digit numeric code only has 100,000 options – I have a 1/100,000 chance of getting it right.

If we made this into a 5-digit hexadecimal code, we would now have a 1/1,048,576 chance – a factor of over 10 times less likely.

Up this to a 6-digit alphanumeric code, and it is now 1/2,176,782,336 – a factor of over 20,000 times less likely we could guess the code.

It doesn’t take many alterations to the limits on codes to make them much more secure.

For this reason it surprises me that alarms are still using 4-digit pins, but most internet forums insist on 8-character passwords with alphanumeric characters and punctuation.

The algorithm isn’t going to stay secret

There is no way to reliably protect a computer application from reverse engineering. If you can run it, at all, it is highly likely the operation can be observed and reversed. Relying on the secrecy of an algorithm or a key hidden within the software is not going to afford any level of security.

One we know the algorithm, the odds massively improve for an attacker

The algorithm takes a version number from 0-255. For a given quote code, I can try each version number, giving me a list of up to 256 potentially valid reset codes (sometimes, two version numbers will generate the same reset code).

If I enter a code from this list, I now have a 1/256 chance of getting it right. Not great compared to 1/100,000 for a purely random guess.

This is entirely due to the short version number used.

Given a quote/reset code, most of the time we can infer the version

It quickly became apparent that for most quote/reset pairs, there was only a single version number than could produce this pair. I’m awful at probability and decision maths, so I thought running a simulation would be better.

I like running simulations – generally when the number of simulations becomes large enough, the results tend towards the correct value. So I tried the following:

1. Generate a genuine quote/reset pair using a random quote.

2. Use a brute force method to see which version numbers can produce this pair

3. Record if more than one version number can produce this quote/reset pair.

I started doing this exhaustively. This would take a long time though… someone on the Crypto stack exchange answered my question with a neater, random simulation.

Pairs

I ran this test over 20 million times. From this it turns out that 99.75% of quote/reset code pairs will directly tell me the version number. Most of the remaining 0.25% require yield two version numbers. A tiny number (<0.001%) yield more than four version numbers. You are almost certain to know the version number after two quote/reset pairs as a result.

What does this mean in the real world?

The version number is treated as the secret, and I am informed that this secret is often constant across an entire alarm company. All ADT alarms or all Modern Security Systems alarms may use the same version number to generate reset codes.

This means I could get hold of any quote/reset pair, infer the version number, and then use that later to generate my own anti-codes for any ADT alarm. I could get hold of these quote/reset pairs by going to an accomplice’s house with a ADT alarm system, or by eavesdropping on communications.

With that anti-code I could either reset a system presenting a quote code, or impersonate an alarm receiving centre (there are other speech based challenge-response requirements here to prove the caller is genuine, which are easily gamed I would imagine).

Conclusion

A 5-digit reset code using an 8-bit key is never going to be secure.

When computer passwords are 8 characters and 128-bit keys are the norm, this anti-code mechanism seems woefully inadequate.

Reversing an anti-code

A contact in the alarm industry recently asked if I could take a look at a quick reverse engineering job. I’m trying to gain some credibility with these guys, so I naturally accepted the challenge.

Many alarms have the concept of an “anti-code”. Your alarm will go off and you will find it saying something like this on the display:

CALL ARC

QUOTE 12345

The idea is then that you call the alarm receiving centre, quote 12345, they will input this into a PC application, get a reset code back, tell the customer, and then they can use this to reset the alarm. This means that you need to communicate with the alarm receiving centre to reset the alarm.

Alarm manufacturers provide their own applications to generate these codes. This particular manufacturer provides a 16-bit MS-DOS command line executable, which will refuse to run on modern PCs. This is a pain – it’s not easy to run (you need to use a DOS emulator like DOS-BOX) and it doesn’t allow for automation (it would be convenient to call a DLL from a web-based system, for example).

So I was asked if I could work out the algorithm for generating the unlock codes. x86 reverse engineering is not my forté, especially older stuff, but I thought i would have a quick go at it.

Turns out it was easier than expected! I find documenting reverse engineering incredibly difficult in a blog format, so I’ll just cover some of the key points.

Step 1: Observe the program

First things first, let’s get the program up and running. DOS-BOX is perfect for this kind of thing.

The program takes a 5 digit input and produces a 5 digit output. There is also a version number which can be input which varies from 0-255.

I spent a while playing around with the inputs. Sometimes things like this are so basic you can infer the operation (say, if it is XORing with a fixed key, flipping the order of some bits or similar). It didn’t look trivial, but it was plain to see that there were only two inputs – the input code and version. There was no concept of time or a sequence counter.

At this stage, I’m thinking it might be easiest to just create a lookup for every single pin and version. It would only be 2,560,000 entries (10,000 * 256). That’s a bit boring though, and I don’t have any idea how to simulate user input with DOS-BOX.

Step 2: Disassemble the program

To disassemble a program is to take the machine code and transform it into assembly language, which is marginally more readable.

There are some very powerful disassemblers out there these days – the most famous being IDA. The free version is a bit dated and limited, but it allowed me to quickly locate a few things.

An area of code that listens out for Q (quit) and V (version), along with limiting input characters from 0-9. Hex values in the normal ASCII range along with getch() calls are a giveaway.

Keyboard input
Another area of code appears to have two nested loops that go from 0-4. That would really strongly indicate that it is looping through the digits of the code.

Other areas of code add and subtract 0×30 from keyboard values – this is nearly always converting ASCII text numbers to integers (0×30 is 0, 0×31 is 1 etc. so 0×34 – 0×30 = 4)

Loops

A block of data, 256 items long from 0-9. Links in with the maximum value of the “version” above. Might just be an offset for indexing this data?

Data!
IDA’s real power is displaying the structure of the code – this can be a lot more telling than what the code does, especially for initial investigations.

Code structure
It’s still assembly language though, and I’m lazy…

Step 3: Decompile the program

Decompiling is converting machine code into a higher level language like C. It can’t recover things like variable names and data structures, but it does tend to give helpful results.

I used the free decompiler dcc to look at this program. I think because they are both quite old, and because dcc has signatures for the specific compiler used, it actually worked really well.

One procedure stood out – proc2, specifically this area of code:
DCC outputIt’s a bit meaningless at the moment, but it looks like it is two nested while loops, moving through some kind of data structure, summing the results and storing them. This is almost certainly the algorithm to generate the reset code.

Now, again, I could work through this all and find out what all the auto named variables are (i.e. change loc4 to “i” and loc5 to “ptrVector”. Or I could step through the code in a debugger and not have to bother…

Step 4: Run the code in a debugger

A debugger allows you to interrupt execution of code and step through the instructions being carried out. It’s generally of more use when you have the source code, but it is still a helpful tool. DOS-BOX can be run in debug mode and a text file generated containing the sequence of assembly instructions along with the current registers and what is being written and read from them. It’s heavy going, but combined with IDA and the output from DCC, it’s actually quite easy to work out what is going on!

Step 5: Write code to emulate the behaviour

Shortly after, I had an idea how the algorithm worked. Rather than work it through by hand, I knocked up a quick Python program to emulate the behaviour.The first cut didn’t quite work, but a few debug statements and a couple of tweaks later, and I was mirroring the operation of the original program.

Overall, it was only a few hours work, and I’m not really up on x86 at all.

I’m not releasing the algorithm or the software, as it could be perceived as a threat. In the next post, I am going to discuss some of my security concerns around the idea of an anti-code and this specific implementation.

What’s inside a WebWayOne SPT?

I managed to find a reasonable resolution image of a WebWayOne SPT (supervised premises transceiver, the device that communicates with the ARC (alarm receiving centre)). Just some quick notes about what is on it.

Annotated PCB

Annotated PCB

The Coldfire processors have a hardware encryption acceleration engine on them, which suggests that some fairly heavy duty encryption is happening.

Tomographic motion detection

Typical alarms use PIR (passive infrared), microwave or ultrasound detectors for motion detection. PIR are by far the most common type of detector – they work by detecting changes in infrared emitted by warm bodies. They are cheap, very reliable, and actually quite hard to beat.

Laser break beams are only really seen in films, though simple active infra-red break beams are often used on scaffolding alarms.

The problem with all of these is that they cannot see through objects. A common method of circumventing PIR detectors is to “mask” them – you either cover them  using paint (or another infrared opaque coating) or simply put something like a box in front of them. Higher security systems have “anti-masking” detectors which use an active element to check that their view has not been masked.

It can mean that complex, cluttered, or continually changing spaces need a lot of PIRs to be adequately covered.

Step in a new type of motion detection – tomographic motion detection. This sounds really clever and innovative. You might have heard of tomography from the medical world – CT scan stands for computerised tomography. It means “imaging by cross section”. Xandem have come to the market with a new detector that uses 2.4GHz radio signals to detector motion in a space.

A group of wireless nodes form a mesh of connections, as shown in this image from the patent:

Mesh network

Mesh network

Each one of those lines represents a radio path. The system uses 2.4GHz signals, the same as with WiFi or Bluetooth. These are heavily attenuated by anything containing water – such as the human body. A human body placed in the radio path of any two nodes will reduce the received signal strength (RSS).

By carefully measuring the RSS from each node to each other and doing some clever processing, you should be able to build up an image of what the area usually looks like. Any significant disturbance would signal an alarm. Hence, motion of a human body can be detected.

This would work through walls, shelves, furniture and so on – as long as the signal strength is attenuated too much.

This is clever stuff. Very easy to fit (though you do need power to each node), and probably very hard to beat. It is expensive though.

For those interested, here is a link to the patent:

https://docs.google.com/viewer?url=patentimages.storage.googleapis.com/pdfs/US20120146788.pdf

And I have pulled a picture of the PCB from the FCC report on it:

PCB

PCB

The markings on the main IC are not visible, but based on the frequency, size of the package, crystal frequency, crystal connections and antenna connections, this is a TI CC2540 RF SoC – a brother to the CC1110 RF SoC, using an 8051 core connected to a RF transceiver.

Interestingly there is a micro-USB and debugging connector on the board as well!