I released the fourth episode of the series Hardware Hacking Tutorial in the Make Me Hack YouTube channel. This episode is about “How To Get The Firmware”.
The Hardware Hacking Tutorial series is to share information on how to do hardware hacking and how to do reverse engineering. The series is useful both for beginners and experts.
If you are struggling to get the firmware out of your device, this is the video for you!
In this video I will explain the possible ways we can use to to get the firmware of our IoT device.
I will do a practical example, of one of these possible ways. I will connect the PC to the UART of our sample device, I will analyze the boot log, I will access the command line interface of the boot loader, and I will dump the firmware, exploiting the dump command available in the boot loader. I will use a couple of scripts, do dump the entire EEPROM in an hexadecimal ASCII text file, and, then, to convert back this file in binary form to get the exact image of the EEPROM.
This is the fourth episode of the series “Hardware Hacking Tutorial” in this complete series we will talk about the hacking process based on:
- Information Gathering from our device.
- Building an emulation environment where to run interesting binaries.
- Discovering how the device works.
- And then hack the device and modify its firmware.
This episode is about getting the firmware file, that is one of the last steps in the information gathering phase.
Intro on How To Get The Firmware
In this episode I will use the same sample Gemtek router as in the previous episodes.
One of the most important basic principle in Hardware Hacking is to follow the “easiest path first”, so, also in this case, the first and easiest thing to do, is to search on Internet or on our device manual, if the manufacturer has a website with a firmware image to download. If we find an image to download we can move forward analyzing the image.
Sometimes the downloaded image is encrypted, it will be decrypted by the boot loader or by the operating system self-upgrade procedure; in this case, unless we find useful information on Internet for decrypting, we need another way to dump the firmware. Once dumped the firmware, extracted his file systems and analyzed the software may be we can discover how the firmware was encrypted.
Sometimes the firmware is not directly available for download from a PC with a standard browser, but the device itself is able do download the image. If this is the case we can sniff the communication with Wireshark and, usually, we can obtain some information:
- We can get the URL of the firmware file so we can download the firmware from our linux PC. Sometimes the server will allow the firmware download only if it receives the “User Agent” string of our device, in this case we can use a command line tool, like wget or curl to set the same User Agent string as in our device (links to these tools below).
- Or we can let the device download the firmware and sniff the entire communication with Wireshark and, then, use the Wireshark ability, to reconstruct and save the file downloaded by our device.
- But we can also have difficulties in using this approach if the device use an encrypted protocol like HTTPS, in this case we can get the fully qualified server name with Wireshark, but not the complete URL or the file content. We could do some trick, like using the mitmproxy software (link below) to try to do a man in the middle attack, but if the IoT device correctly manages the security certificates this attack will not succeed.
- If our IoT device is a router and if we try to sniff a router’s communication we have the added difficulty that the router will download the firmware update using the wan interface that is the ADSL or Fiber interface and, for us, it is almost impossible to sniff on that interface. We could try to connect this router to an existing LAN, change its routing table on his web management interface, and see if it will accept to download the firmware update using the ethernet interface.
Another possibility, to dump the firmware of our device, is to attach our Linux box serial interface to the UART of the device, and interact with the device boot loader. If the device has a boot loader that has a Command Line Interface, and if this Command Line Interface has a dump flash command, we can use this command to dump the entire EEPROM. This is the approach we will use for our sample Gemtek router that we will see later.
If everything above fails, we can try to use the JTAG interface with Bus Pirate or Bus Blaster and OpenOCD, as explained on the previous episode of this series, but this is quite complicated and, often, the JTAG interface is software disabled. Sometimes the JTAG interface is available for few milliseconds after powering on the device, before it is disabled, but to exploit this possibility some additional circuitry, that controls the power supply of the device, must be used.
Another, possibility is to read the EEPROM memory chip directly; in few cases, with serial based EEPROM with easy to access packages, like some EEPROM with DIP8 or SOIC8 packages, it is possible to read and write the EEPROM content without removing the chip from the board; but also in this case we have to give power to the chip, but don’t want to give power to the entire board and having the CPU starting and interfering with our readings, so, sometimes we have to temporary cut some pins from the board.
Anyway usually the packages are much more complex.
For example, in our sample router,the package is really compact with the pin pitch of 0.5mm, in this cases there is no possibility to attach a clip directly on the board; one possibility is to de-solder the chip and remove it from the board using an hot air gun and then use the appropriate adapter, if we have one, to read the EEPROM with an EEPROM programmer attached to our PC. Anyway this operation is not easy for an hobbyist, there is the possibility to damage the chip and nearby components, if the temperature goes too high, and it is almost impossible to manually re-solder the EEPROM on the board later, so this approach can be used when we have more than one board and we can destroy one.
Analyzing the boot log file
For our sample Gemtek router the firmware is not available on Internet for download, so we have to find another way do dump his firmware. We will connect our Linux box to the UART interface and will analyze the boot log to see if it is possible to interact with the boot-loader to dump the EEPROM and to get more information about our device.
So, first of all, I connect my PC to the router’s UART interface. All the details on how to find the UART interface and how to connect to it are available in the second episode of this series.
Then I start the PuTty serial terminal emulator, enable logging to a file, power on the device and wait until the boot process have finished to write a lot of information on the serial console and the standard Linux login prompt has appeared.
Now I try to login using the default username/password printed on the manual to access the web management interface, but it doesn’t function on the serial console.
Now we can start analyzing the boot log file written by PuTty to see if there is something interesting, usually we can get many information from a boot log file:
- We can get the boot loader name and version.
- The System On a Chip part number and his architecture and instruction set.
- The amount of RAM installed.
- The amount of EEPROM installed.
- The operating system kernel and version.
- The file system types used.
- The EEPROM partition details.
- Information on the Init process, on Linux systems.
- Information if the boot loader has a Command Line Interface.
Now we close the PuTtty terminal emulator and start looking at the boot log file it has written on disk with the “less” command.
One of the first information is related to our EEPROM device; on the boot log line MTD stands for Memory Technology Device and it is the name of the device driver for interacting with flash memory. In our case we have a NAND Flash memory that has some peculiarities:
- It can be read or written a page at a time. A page belongs to a larger block that must be cleared before writing (every bit equal to 1).
- It can be erased a block at a time, a block includes many pages.
- During operation some bits can spontaneously fails, for this reason each page has a certain number of bytes, for error correction codes, called OOB or Out Of Band data.
- It has a finite number of program/erase cycles, this means that the file system must be aware of this limitation and spread the writes/erasing cycles evenly on the memory.
- The information in the boot log tells us that the page size is 2Kbytes.
- The OOB, Out of Band data used for error correction, is 64 bytes for each page.
- The erase size is 128Kbytes.
- the memory width is 8 bits that means it is accessed a byte at a time.
Then we understand that the boot loader is U-Boot version 1.1.3, U-Boot is a popular Open Source boot loader (link below).
It seems we have an additional Ralink WiFi Board; Ralink is a WiFi chipset manufacturer that was acquired, few years ago, by Mediatek. This board is probably the one below the metal sheet on our motherboard.
It seems that this additional board is also running the boot loader U-Boot with a more recent version, but, at the moment, we are not interested in this additional board.
We can confirm that the System On A Chip is a dual core Mediatek MT7621A. We already identified it, visually inspecting the board, on the first episode; it is running at 880Mhz.
We have 128 Mb of RAM and we can confirm that we have a NAND Flash EEPROM.
Then we have a very interesting menu: It is a U-Boot boot-loader menu that allows, among other things, to enter a Command Line Interface prompt; it is exactly what we were looking for, later we will reboot the router and we will enter this menu. Anyway, by default, U-Boot has booted the Operating System from the flash memory.
U-Boot loads the boot image in memory, it has two parts, the Linux kernel and the root file system. The image is loaded at page 81.00.00.00 and it is a MIPS linux image, this confirms that we have a MIPS architecture.
Then the Linux operating system starts and prints his kernel version that is 2.6.36, and the CPU revision and type, and this confirms that we have a MIPS 32bit CPU. We also get another very useful information: this system has been built using Buildroot version 2015.02, this will help a lot when we will build an emulation environment where to run interesting binaries. Buildroot is a simple, efficient and easy-to-use tool to generate embedded Linux systems through cross-compilation.
Another useful information is that the root file system is a squashfs file system; this is a popular file system in embedded devices; It is never modified in EEPROM, it is loaded in RAM during boot and every time the system is powered off and then powered on, it reloads the same unmodified root file system. This is the second image loaded by the U-Boot boot loader.
Then we can spot the most useful information: how the EEPROM is partitioned, for each partition we have the starting address and the partition length in hexdecimal. We have 9 partitions:
- two partitions for the boot loader;
- one partition that probably will store the router configuration;
- two partitions for the environment, the boot loader environment;
- two partitions for the kernel and the squashfs root file system;
- two storage partitions for the read/write file system used by the router.
The reason why each partition is duplicated, except the router’s configuration partition, is to upgrade the router upgrading the non-active partitions, and then switching partitions to boot from the new upgraded partitions. If something goes wrong, the router can automatically boot from the old partitions. The configuration partition is not duplicated because it stores the router configuration (like WiFi password, web admin password etc.) that will remain the same across upgrades.
Finally the Linux kernel starts the init process, it is the first process started on a Linux or Unix system; here we can find another very useful information: the init process is Busybox version 1.23.1. This means that this Linux system is based on Busybox, this is a very popular choice in embedded devices, because Busybox, in a single and small binary implements, with minor limitations, a lot of traditional linux commands as the init process, the shell interpreter, the grep command, the ls command and many many other Linux commands.
Another useful information is that the storage partition is an UBIFS file system that uses the lzo compressor. UBIFS is a popular file system for NAND Flash devices, because it is aware of the NAND flash peculiarities and it is good at the so called wear leveling that means distributing the writes evenly in the entire NAND flash device, to extend the life of the NAND EEPROM that has a limited number of rewrites before starting to fail.
Near the end of the boot cycle we can see that the router try to connect to his master, acs.linkem.com, using the TR069 protocol; this is a standard protocol to allow an Internet Service Provider to remotely access, reset, reconfigure and upgrade your router without needing your help or your consent. In this case the router is disconnected from internet, so it is not able to contact his master and to resolve his master’s hostname.
Finally, at the end, we get our login prompt. The router calls himself “buildroot”, it is the default name of Linux embedded systems that have been built using the buildroot software. We try to login with the “admin” username, because on the manual we have the “admin” and his password to access the web interface, but instead of receiving a password prompt we are receiving a challenge code that seems a binary string encoded in Base64, because the chars belongs to the Base64 character set that includes letters from a to z, both lowercase and uppercase, digits from 0 to 9 and the / char and the + char.
In one of the next episodes we will reverse engineer this login binary and will understand how the authentication works. We will not be able to defeat this authentication algorithm, but we will easily work around it, replacing this login binary with a standard login executable.
Dump the EEPROM content, in hex, to a text file
We have seen that analyzing what the device prints on the serial console during boot, we got a lot of very useful and interesting information; but for now we are mainly interested in the U-Boot command line interface to see if we can dump the EEPROM content.
Analyzing the boot log file we have seen that the U-Boot loader prints a menu to let the user to choose the operation to do, so we will power cycle our router and will wait until the menu is displayed on our terminal emulator and then press “4” to enter the U-Boot command line interface.
We now have a U-Boot prompt. U-Boot is an open source boot loader that can be heavily customised, this means that usually only a small subset of all U-Boot commands are actually available; we type “help” to have a list of commands.
The most interesting command for us, at the moment, is the “nand” command; to have more information we type “help nand”.
We can see that we have the “nand read” command that can read from EEPROM and write to RAM.
The “nand write” command does the opposite, can read from RAM and write to EEPROM.
The “nand erase_write” is a similar command, but will erase the EEPROM before writing.
The “nand dump” seems interesting to dump the content of the EEPROM, that is what we need, but it doesn’t do what we want, it dumps some information about the EEPROM.
The command that does what we need is “nand page”, we can see that if we pass the page number, it will dump the content of the entire 2Kb page on our terminal, including 64 bytes of OOB data, the Out Of Band data used for error correction.
If we type “nand page 0”, then “nand page 1” up to “nand page FFFF” we can dump the content of the entire EEPROM. But we have two issues:
- first, it is not feasible to manually press more than 65,000 times “nand page_number”;
- second, we have the EEPROM dumped in hex decimal in a text file and not a binary file.
For the first issue we can write a small script that gives the “nand page” commands for us; I am an old man, so I used an ancient tool, that was popular in the nineties, that is “expect” and it is based on the TCL language, a language with a quite unusual and strange syntax. You can write this script in Python using the Pexpect module if you prefer. I called this script “serial-flash-dump.expect”, you can find it on my Github repository, link below.
One important thing to note is that this expect program have to interact with a TTY device, the serial interface in this case, and not with the standard input/standard output, for this reason we need the expect tool or the Pexpect module in Python because they are able to interact with a TTY device. In our case this device is the serial device but, more in general, expect or the Pexpect Python module, will interact with the terminal device.
Anyway this is a very simple program:
- get the serial device name as parameter, in our case it is /dev/ttyUSB0;
- set serial parameters, like serial speed and so on;
- open the modem;
- wait for the string “Load Boot Loader code etc” that is the last option in the U-Boot menu;
- then send the string “4” to select the U-Boot command line interface;
- then execute a long loop, from 0 to FFFF, each time waiting for the prompt and immediately after issuing the “nand page” command, passing as parameter the loop variable converted in hexadecimal.
We can see what this command does, executing it in our linux box terminal.
To save in a file what i am dumping we can use the same command, with a pipe passing his output as input to the “tee” command; the “tee” command will write on standard output everything it reads from his standard input and will write also, the same content, to the named file passed as parameter, in this case “eeprom.txt”; In this way the entire EEPROM will be dumped on the “eeprom.txt” file and we can monitor that this script is running and has not frozen.
We know that the EEPROM has 128Mb of RAM, it is dumped in hexadecimal so each byte is converted in 3 chars (two hex digits, plus the space), plus we have the OOB data for error correction that is 64 bytes every 2Kbytes, a 3% overhead, this means that the dumped file will be about 400Mb. The serial interface has a speed of 115200 bit/s that means about 11.5 Kb/s, this means that it will take about 10 hours to dump the entire EEPROM content!
We can launch the expect script in the evening, we can have a long sleep, and in the late morning we have the entire content of the EEPROM dumped, in hexadecimal, in our text file.
If we look at the text file, we can see the strings that our expect script wrote, then, moving forward we can see the menu written by the U-Boot boot loader, our script selected the option “4” for the command line interface. Then waited for the command prompt and sent the “nand page 0” command, the U-Boot dumped the first 2Kb page of the EEPROM, including the OOB data used for error correction.
Then the script waited again for the command prompt and issued the “nand page 1” command and so on until the last page of the 128Mb EEPROM that is page FFFF.
If we look at this file we can see that after the “nand page” command we have a line with the string “page” and the number of the page in hexadecimal, then we have the 2Kb of EEPROM page dumped in hexadecimal, 32 bytes per line arranged in four groups of 16 lines separated by a blank line.
Convert the hex dump to binary
To convert back to binary this text dump file we can write a script that does this conversion. Again I am an old men and I learned the Perl language in the early nineties and used it extensively till today, so I wrote this script in Perl but, if you prefer, you can rewrite it in Python. It will read the text file in the standard input, and it will use regular expressions to extract the hexadecimal strings, convert them to binary and write the output to a binary file that will be, bit by bit, the EEPROM image. I have ignored the OOB data, the data used for error correction, and it seems that this hasn’t produced any issue on the EEPROM image.
The script is simple but it seems more complicated, because it has the option to include the OOB data in the output and has some error checking, to prevent writing the same page twice, if, for example, the input script has been generated in multiple, overlapping, sessions.
The script is called “hexdump2bin.pl” and you can find it on same Github repository as the previous script. Links in the description below.
The core of the program is a regular expression that is expecting 2 hex digits followed by a space, repeated 31 times, followed by 2 hex digits; this time not followed by a space because at the end of the line we have an end of line char and not a space.
Then this line is split in 32 hex bytes, some error checking is done, and then each hex byte is converted to binary and written to the standard output.
We can convert the EEPROM text dump file executing this script.
If we take a glimpse of the converted binary file with the “hexdump” command we can see that it seems OK.
If we take a glimpse of the converted binary file with the “binwalk” command we can see that it find some interesting stuff inside, like U-Boot image header, U-Boot version string, a squashfs file system, so, probably, it means that our binary file is OK.
Binwalk is a fantastic tool to analyze firmware files, it can scan a binary file searching for many different types of signature, identifying many types of boot loaders, file system images, segment of compressed data, digital certificates and so on. It can also graphically display the entropy of a binary file letting us to easily understand if it’s a plain file or an encrypted or compressed file with an undetected compression algorithm. It is particularly useful when the firmware file has been downloaded, as a firmware update file, form our device supplier web site.
Any way we will see in the next episode how to extract the boot loader, the root file system and the other file systems from this EEPROM image and to use, more generally, the “binwalk” tool.
Links with additional information
- Channel’s Author
- Channel’s Web Site
- The sample router (Gemtek WVRTM-127ACN) on techinfodepot
- The sample router (Gemtek WVRTM-127ACN) reverse engineered on GitHub, includes scripts to dump the EEPROM to a text file and to convert it back to binary file
- TTL Serial Adapter (affiliate link)
- PuTTY, the terminal emulator
- Wireshark, Ethernet analyzer
- Curl, command line tool for transferring data with URLs
- Wget, retrieving files with URLs
- Mitmproxy, a free and open source interactive HTTPS proxy
- Bus Pirate
- OpenOcd, On Chip Debugger
- U-Boot, The Universal Boot Loader
- Buildroot, a simple, efficient and easy-to-use tool to generate embedded Linux systems through cross-compilation
- Binwalk, a fast, easy to use tool for analyzing, reverse engineering, and extracting firmware images
- SOIC8 SOP8 Flash Chip IC Test Clips Socket Adpter BIOS/24/25/93/95 Programmer (affiliate link)
- Second episode: How To Find The UART Interface