Just after my winter holidays I had a lot of work to do, to backport patches to our the enterprise kernels I care about at SUSE. This included some very low-level hacking in the kernel and it can get cumbersome to debug what’s going on. In this blog entry I try to (finally) explain how to handle this kinds of problems.
When you work on some generic kernel code (not related to a device driver) you can try your code in qemu. This is faster to boot and gives you some nice extra options for debugging. For example in my case, I didn’t see any output on the serial console after booting my new kernel. What helped here was the following:
Run qemu with -d int to get debug output on every interrupt. After doing so, we can see the following:
SetUefiImageMemoryAttributes - 0x00000000B8550000 - 0x0000000000040000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x00000000B84B0000 - 0x0000000000040000 (0x0000000000000008)
Taking exception 11 [Hypervisor Call]
...from EL1 to EL2
...with ESR 0x16/0x5a000000
...handled as PSCI call
Taking exception 11 [Hypervisor Call]
...from EL1 to EL2
...with ESR 0x16/0x5a000000
...handled as PSCI call
Taking exception 4 [Data Abort]
...from EL1 to EL1
...with ESR 0x25/0x96000006
...with FAR 0xffff000001d373c0
...with ELR 0xffff8000001c89bc
...to EL1 PC 0xffff800000084a00 PSTATE 0x3c5
Now the program counter is in the exception code, which doesn’t help us much. But aarch64 saves the code location from where the exception was triggered in the ELR register. So we know where the code is. So, let’s look it up:
aarch64-linux-gnu-objdump -d ~/kernel/vmlinux
We search for the program counter (0xffff8000001c89bc) in the assembly output and find this function:
ffff8000001c8980 :
ffff8000001c8980: a9bd7bfd stp x29, x30, [sp,#-48]!
ffff8000001c8984: 910003fd mov x29, sp
ffff8000001c8988: a90153f3 stp x19, x20, [sp,#16]
ffff8000001c898c: f90013f5 str x21, [sp,#32]
ffff8000001c8990: aa0103f3 mov x19, x1
ffff8000001c8994: aa0203f4 mov x20, x2
ffff8000001c8998: aa0003f5 mov x21, x0
ffff8000001c899c: aa1e03e0 mov x0, x30
ffff8000001c89a0: 97fb2420 bl ffff800000091a20
ffff8000001c89a4: eb14027f cmp x19, x20
ffff8000001c89a8: 540000a3 b.cc ffff8000001c89bc
ffff8000001c89ac: 14000018 b ffff8000001c8a0c
ffff8000001c89b0: 91006273 add x19, x19, #0x18
ffff8000001c89b4: eb13029f cmp x20, x19
ffff8000001c89b8: 540002a9 b.ls ffff8000001c8a0c
ffff8000001c89bc: f9400a60 ldr x0, [x19,#16]
ffff8000001c89c0: 927ff800 and x0, x0, #0xfffffffffffffffe
ffff8000001c89c4: eb0002bf cmp x21, x0
ffff8000001c89c8: 54000221 b.ne ffff8000001c8a0c
ffff8000001c89cc: f9400260 ldr x0, [x19]
ffff8000001c89d0: b4ffff00 cbz x0, ffff8000001c89b0
ffff8000001c89d4: 97fc5d0d bl ffff8000000dfe08
ffff8000001c89d8: 34fffec0 cbz w0, ffff8000001c89b0
ffff8000001c89dc: f9400a61 ldr x1, [x19,#16]
ffff8000001c89e0: aa1303e0 mov x0, x19
ffff8000001c89e4: 91006273 add x19, x19, #0x18
ffff8000001c89e8: 927ff822 and x2, x1, #0xfffffffffffffffe
ffff8000001c89ec: 12000021 and w1, w1, #0x1
ffff8000001c89f0: b9400042 ldr w2, [x2]
ffff8000001c89f4: 7100005f cmp w2, #0x0
ffff8000001c89f8: 1a9f07e2 cset w2, ne
ffff8000001c89fc: 4a010041 eor w1, w2, w1
ffff8000001c8a00: 97fb3100 bl ffff800000094e00
ffff8000001c8a04: eb13029f cmp x20, x19
ffff8000001c8a08: 54fffda8 b.hi ffff8000001c89bc
ffff8000001c8a0c: a94153f3 ldp x19, x20, [sp,#16]
ffff8000001c8a10: f94013f5 ldr x21, [sp,#32]
ffff8000001c8a14: a8c37bfd ldp x29, x30, [sp],#48
ffff8000001c8a18: d65f03c0 ret
ffff8000001c8a1c: d503201f nop
So the exception is triggered on ldr x0, [x19,#16] but that’s a bit cryptic and might be difficult to find in the C code. We can use the following command to find the line (might not always be the correct line though):
aarch64-linux-gnu-addr2line -f -e ~/src/kernel/vmlinux ffff8000001c89bc
jump_entry_key
~/src/kernel/kernel/jump_label.c:205
So if we look on the code we find around line 205 in jump_label.c
static inline struct static_key *jump_entry_key(struct jump_entry *entry)
{
return (struct static_key *)((unsigned long)entry->key & ~1UL);
}
Looking at the struct jump_entry we can see, that actually key is 16 bytes offset from the start, so the problem is that the key has a not valid memory location. After some digging I realized that the jump_entries were already initialized and re-initializing them twice is a bad idea. From there it was easy to find the missing patch I had to backport.
Too much time passed by without an entry here. So today I will just write down the few steps you’ll need to do to be able to debug your mobile. I’ll base this post on openSUSE because that’s what I use.
You might know that google developed some tools to communicate with your device. Basically that’s adb and fastboot. I’ll talk about adb today, as it’s a somehow handcrafted approach.
So let’s go. First of all you will need the adb tool. So install android-tools from your favorite openSUSE distribution [1].
sudo zypper in android-tools
That was easy. Now if you run adb on the command line it will start the server, but if you type
adb devices
Your mobile would not show up. That’s because you need to set up udev rules and add your Vendor to some magic file. To do so, you first need to find out your idVendor and idProduct of your mobile. So watch your kernel log while plugging in your device:
sudo journalctl -kf
you will see something like:
[23468.979604] usb 2-3: New USB device found, idVendor=2a47, idProduct=7f10
For the curious, yes it’s a bq [2].
So we need to create a udev rule to be able to connect to the device. For that we add, as superuser the following line to ‘/etc/udev/rules.d/51-android.rules’
SUBSYSTEM=="usb", ATTR{idVendor}=="2a47", ATTR{idProduct}=="7f10", MODE="0666", GROUP="dialout"
Now we need to add the id Vendor as well to ‘~/.android/adb_usb.ini’ but prefix it with a 0x. So you adb_usb.ini should have one idVendor per line, in our case it should look like this:
> cat ~/.android/adb_usb.ini
0x2a47
Now, in the developer options of your phone, enable debugging via usb and you should be fine to go. If it’s not working try to kill the adb server
adb kill-server
and see if your phone is set to connect to your computer using MTP.
Have fun!
[1] https://www.opensuse.org/
[2] https://www.bq.com/
Lately I try to get some patches for the device tree of the ARM architecture of the Linux kernel upstream. Device tree is a nice features which will get rid of the board specific boot up code (or most of it), which heavily populates the ARM architecture folder of the kernel.
Device tree bindings are still lack a good development process, like a device tree capable editor, a spell checker etc. Apart from that, the documentation of the device tree bindings are not complete, so most of the times it happens, that you have to re-iterate over, building your device tree blob, copying it to your board, booting your board and realize that there is an error.
That’s where u-boot comes in. U-boot is a feature rich boot-loader, mostly for embedded boards, which allows you to load your kernel over the network and mount your root file system over NFS. So what I wanted to achieve, was to boot my kernel and device tree blob over NFS. I knew from my colleague Enric that loading the kernel via NFS works. So I took his uEvent.txt and added my device tree stuff to it:
dtbaddr=0x81600000
kerneladdr=0x80000000
bootargs-base=console=ttyO2,115200n8 console=tty0
mmc-bootargs=setenv bootargs ${bootargs-base} root=/dev/mmcblk0p2 rw rootwait
ethaddr=ac:de:48:af:fe:00
ipaddr=192.168.2.252
netmask=255.255.255.0
gatewayip=192.168.2.1
serverip=192.168.2.59
nfs-home=/home/matthias/nfs
nfs-kernel=nfs ${kerneladdr} ${nfs-home}/boot/uImage
nfs-dtb=nfs 0x81600000 ${nfs-home}/boot/omap3-igep0020.dtb; fdt addr ${dtbaddr}; fdt resize
nfs-boot=run mmc-bootargs; run nfs-dtb; run nfs-kernel; bootm ${kerneladdr} - ${dtbaddr}
uenvcmd=run nfs-boot
Beware that we just load the kernel and the device tree blob over the network. The reason is, that on my board, the igep0020, no device tree bindings for the network exist for now, so we can’t mount the root file system over NFS.
Ok it seems easy to get that working, but actually it wasn’t that easy. The reason is the following:
When you try to access the two files, the boot process, after loading the first file I get the error: ERROR: Cannot umount. And after u-boot tries to load the second file, the boot process stops with ERROR: Cannot mount.
I used the newest version of u-boot (v2012.10-rc2), so it looked like there is an issue in the NFS protocol implementation, and so it was.
First of all, you can see some “T”‘s printed on the console by u-boot. This means, that time-outs occured while awaiting a reply from the server for a sent NFS request.
After digging into the NFS code of u-boot and having a look with Wireshark and reading the RFC of NFS, to see what is going on, I found the following problem:
U-boot does unmount all mounts after reading the first file. This umountall command does not respond in time by the server. So a time out occurs. Eventually the reply arrives, but the implementation of the NFS protocoll already has send a new umountall command to the server. Every time it resends a time outed command, it increments the RPC ID for identifing, if the answer of the server corresponds to a request from the client (remember NFS is a state-free protocoll).
So when the answer of the server arrives, it has a smaller RPC ID as the actual one (as after a time-out u-boot adds one to this ID and resends the command). When a reply arrives with an RPC ID which does not correspondends to the actual RPC ID of u-boot, the NFS protocoll implementation of u-boot regards it an error and aborts the actual command. In our case, first the umount all is aborted and later the mount.
I digged into the code and provided a patch for the problem. In a first iteration it basically did two things. First of all, when an time out leads u-boot to resending the request, we do not update the RPC ID. We just resend the command until a threshold of maximum resends is reached and then abort the request.
This alone does not solve the problem. Responses from the server for the first umount request arrive, when we try to mount the NFS share to read the second file. The process looks like this
- mount first file
- read first file
- umountall (time out) #1
- umountall (time out) #2
- umountall
- get response for #1
- mount second file
- get response for #2
So NFS recongnizes the reply to umountall #2 as a response for mount. As the RPC ID isn’t the same, it cancels the operation.
So as a second step I changed the code, so that RPC IDs smaller then the actual one, will be ignored instead of recognize them as error. This way, we will keep on sending the mount command with the same RPC ID while we receive the replies for umount all from the server.
But after some days and digging even more in the specs of NFS, I realized, that instead of keep on sending with the same RPC ID in case of a time out, I should actually something else. When we get a time out for our message, that means, that the server isn’t able to answer our request. That might be of various reasons, but one could be, that for some case we send requests faster then the server can process them. Repeating a time out message just as is, wouldn’t help at all in this situation. So in a second patch I corrected the behaviour. When a time out occurs, we will increment the RPC ID and resend the command, but we will wait twice the time for the answer.
I’m quite happy, that in the end my patch got accepted by the u-boot maintainer. If not, I would have needed to keep on applying a workaround to my host, add your IP-address to /etc/hosts on your server.
At the last Embedded Linux Conference Europe I gave a presentation about my experience of porting Android (Ice Cream Sandwich) to a new custom board.
I hoped that people will find my presentation useful although there a several other good ressources as a presentation at the Android Builders Summit and good information at the elinux wiki.
You can find the slides here. Thanks to the guys from Free Electrons my speech, as well as all the others, were recorded and are available here.
In my presentation I tried to concentrate more the different system components of Android and how they are interacting. Please beware that, depending on your experience with Linux, some of the explanations might not be totally new to you, but they were for me.
I hope you find the presentation helpfull.
These days I tried to figure out, why my custom board didn’t generat the right video output for HDMI TV screens. Especially I wanted to add the 720p and 480p standard to my Linux kernel, to be able to show the desktop on any TV screen, which supports HDMI input with this standard.
My first problem was, that the CEA-861-x standard is not freely available and information on the web is quite sparse. But I was lucky and found this great post on the beagleboard mainlinglist.
I will try to sum up the hands-on part of it, so that it’s easier to calculate timings yourself. In my case, I afterwards added the values to the database in drivers/video/modedb.c with a special name (hd720 and hd480). So on boot-up I can tell the kernel which mode I want to use on my HDMI interface. Notice that I use a custom kernel based on version 2.6.37.
Ok let’s start.
First of all, the resolution you will see on your screen is not the real size of the framebuffer you need. To the framebuffer horizontal axis a horizontal blanking is added, and to the vertical axis a vertical blanking. This depends on the mode you want to display on your screen and varies (e.g. between 720p and 480p).
With this values you can calculate the real size, and therefore pixclock value, which is needed by the Linux kernel, to know how long it needs to update one pixel. The pixclock value is given in pico seconds, whereas the update frequency of your screen is given in Hz (1/seconds).
So to get the pixclock value, use the following formula:
real_horizontal_size x real_vertical_size x pixclock x 10^-12 = frequency
Let’s see how this works on a real example. I found a table with timing constrains for the standards, not sure if they are the original ones (they are protected by law), but I implemented the 480p standart and it works for me…
So let’s go. We can see the horizontal and vertical total number of pixel. This number is the sum of front porch, sync and back porch. This means, the sum of these three values equals the horizontal and vertical blanking. Let’s calculate the pixclock then:
880 x 525 x pixclock x 10^(-12) = 60 Hz
Some easy math brings us the the pixclock value of 36075 ps.
Great, let’s go on and find the horizontal sync (hsync) and vertical sync (vsync). We can read this values directly from the table. I found an implementation of 720p which uses the half of the hsync value, but I don’t know why. I tried to implement the standart with half of the hsync value, but found it difficult to calibrate in the end. Later more on that.
Alright, we take a hsync of 40 and a vsync of 3.
I don’t know the details of the 480p standart but the post mentioned above states that left margin and upper margin are fixed values in 720p.
So we assing the horizontal back porch to the left margin and the vertical front porch to the upper margin. To calculate the right margin and the lower margin use the following formulas:
right_margin = horizontal_blanking – hsync – left_margin
lower_margin = vertical_blanking – vsync – upper_margin
So you can see, that lower margin and upper margin (as well as right and left margin) are in a linear relationship. That means if you increment the upper margin by 5 (for scaling) you have to decrement the lower margin by 5 as well.
You see, as I’m talking about scaling by hand (!) it seems that the table I mentioned, isn’t totally correct. But at least for me it was a good strarting point to do the math and in the end get a picture on the screen. Afterwards I had to scale the left and right margin by hand, to get a good result on my Samsung SyncMaster B2030HD as well as on my OKI V32A-FH television.
That’s it, I hope you found this entry helpfull. As you can see, I’m not an expert in this stuff and there might be errors in the explanation, please don’t hesitate to correct me, to make this a usefull information for others.
Filed under: Noticias
It seems that ubuntu has changed the way you have to compile a kernel and install it. In older versions you used the make-kpkg tool to build debian pakages. In the version I use these days (11.10) I do it this way:
- get the linux source tree, unpack it.
- sudo make menuconfig
- sudo make
- sudo make modules_install install
After waiting quite a while, everything is done, even your grub menu was updated. So enjoy your machine…
… unless you have a Nvidia card. In this case you might have some trouble:
Your system might freeze with a black screen and you won’t be able to use any console with Ctrl+Alt+F1…
The solution is to start the kernel in recovery mode, log-in and execute the following command:
sudo jockey-text -e xorg:nvidia-current
This will update the Xorg config. Be sure that the nvidia-current kernel module is installed on your machine. You might need to reinstall your nvidia package first:
sudo apt-get install –reinstall nvidia-current
Have fun!
Filed under: hint
MADBench2 is a benchmarking tool to simulate high performance computing (HPC). I used it to evaluate some ideas for a project.
As it took me some time to get it compiled and running so I add a patch file here. Just patching the source code folder should do it.
diff -rupN madbenc2.orig//hostfile madbenc2//hostfile --- madbenc2.orig//hostfile 1970-01-01 01:00:00.000000000 +0100 +++ madbenc2//hostfile 2011-04-14 15:41:27.439836179 +0200 @@ -0,0 +1 @@ +localhost diff -rupN madbenc2.orig//MADbench2.c madbenc2//MADbench2.c --- madbenc2.orig//MADbench2.c 2006-10-12 23:16:40.000000000 +0200 +++ madbenc2//MADbench2.c 2011-04-14 15:22:26.678836179 +0200 @@ -6,12 +6,17 @@ /* jdborrill@lbl.gov */ /*****************************/ +#define _LARGEFILE64_SOURCE + #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/stat.h> #include <math.h> -#include "mpi.h" +#include <mpi.h> +#include <string.h> + + #include "MADbench2.h" #define PCOUNT 8 diff -rupN madbenc2.orig//Makefile madbenc2//Makefile --- madbenc2.orig//Makefile 2011-03-25 16:06:48.464573544 +0100 +++ madbenc2//Makefile 2011-04-14 15:38:42.805174338 +0200 @@ -0,0 +1,12 @@ +MPI_HEADER=/opt/mpich/ch-p4/include/ +MPI_LIBS=/opt/mpich/ch-p4/lib64 + +all: + mpicc -D SYSTEM -D IO MADbench2.c -isystem $(MPI_HEADER) -lmpich -L$(MPI_LIBS) -lm -o MADbench2 + +clean: + rm MADbench2 MADbench2.o + diff -rupN madbenc2.orig//run.sh madbenc2//run.sh --- madbenc2.orig//run.sh 2011-03-25 16:08:39.173698549 +0100 +++ madbenc2//run.sh 2011-04-14 16:07:36.359773210 +0200 @@ -0,0 +1,22 @@ +#!/bin/bash + +NO_PIX='1000' +NO_BIN='160' +NO_GANG='1' +SBLOCKSIZE='32K' +FBLOCKSIZE='32K' +RMOD='4' +WMOD='4' + +export IOMETHOD="POSIX" +export IOMODE="SYNC" +export FILETYPE="SHARED" +export REMAP="CUSTOM" +export BWEXP='1' + +export LD_LIBRARY_PATH=/usr/lib64/mpi/gcc/openmpi/lib64 +mpirun -np 4 -v -v ./MADbench2 $NO_PIX $NO_BIN $NO_GANG $SBLOCKSIZE $FBLOCKSIZE $RMOD $WMOD
I also added to the patch a script on how to run the program. Especially take care if you want to change the exported execution commands, be sure that they are added to the .bashrc file in every computation node.
Hope you find it useful.
Filed under: hint
The glibc exposes a function to get the time difference between two time_t structures (difftime). But what when you need some more accurate time? Normaly you use gettimeofday.
Libc lacks an implementation to compare two times achieved calling gettimeofday, so here comes my implementation.
Comments on this? The function gettimeofday gives you a really fine grained time in seconds and microseconds that happend since the Epoch (00:00:00 UTC, January 1, 1970). However remember that when you use my function, it will take some time to calculate the differences. Depending on your hardware, software and the real-time aspects of your application, the implementation might not fit your needs.
Anyway, here we go. As always comments are highly appreciated…
// This function stores the result in dtv of a older (otv) and newer (ntv) time.
#include <sys/time.h>
int time_diff(struct timeval otv, struct timeval ntv, struct timeval *dtv)
{
if(ntv.tv_sec < otv.tv_sec)
return -1;
else if(ntv.tv_sec == otv.tv_sec && ntv.tv_usec >= otv.tv_usec) {
dtv->tv_sec = 0;
dtv->tv_usec = ntv.tv_usec - otv.tv_usec;
} else { // ntv->tv_sec > otv.tv_sec
if(otv.tv_usec > ntv.tv_usec) {
dtv->tv_sec = ntv.tv_sec - 1 - otv.tv_sec;
dtv->tv_usec = ntv.tv_usec + 1000000 - otv.tv_usec;
} else {
dtv->tv_sec = ntv.tv_sec - otv.tv_sec;
dtv->tv_usec = ntv.tv_usec - otv.tv_usec;
}
}
return 0;
}
OpenSUSE dirstribution includes a modified kernel for using xen as dom0 host.
But it has some drawbacks.
1. yast doesn’t set the hypervisor parameter right. instead of “vga=mode-0x314” it sets “vgamode=0x314”. You have to change this by hand in the bootloader menu.
2. nvidia graphic drivers does not support xen. Therefor you have to install it from hand. First of all you need kernel-source and xen-dev packages. Get the driver from the nvidia repository and save it somewhere like in /root.
Start your computer with the xen kernel.
Change to init 3 level.
Then type:
“export IGNORE_XEN_PRESENCE=1 SYSSRC= SYSOUT=” e. g.
export IGNORE_XEN_PRESENCE=1 SYSSRC=/usr/src/linux-2.6.34.7-0.3 SYSOUT=/usr/src/linux-2.6.34.7-0.3-obj/x86_64/xen
Afterwards install the nvidia driver: sh ./NVIDIA-Linux-x86_64-256.53.run
Changing to run level 5 (init 5) should show you the X-login screen.
Well done!
Filed under: hint
As a computer scientist, reading schematics for electronic devices was one of the things I had to learn at work. And I really appreciate it.
But I’m not an electronic engineer, that’s why I don’t have to understand every part of the design, but the big picture. Although I’m really keen on schematics, so I try to learn some new stuff every now and then. So lately I recognized a diode in some circuits, and I had no clue what it’s for:
This is a reset circuit. When the button S is pushed, the RESET of the CPU goes low, so that the CPU will be hold in reset until RESET goes high again. Resistor R2 acts as a voltage delimiter so that the capacitor C won’t get harmed. The capacitor will planish the current running, so that we won’t get any reset, because VDD (+5V) has minor inteferences. With resistor R1 you can regulate, how long you have to push the button to get a reset signal.
So now the clue, what’s the diode D for? Well imagine you take away VDD and put it quickly, something that happens often, especially in production stage. In this case, VDD will be ground, but the voltage won’t be able to flow there quickly, as it has to pass R1. So it may happen that the CPU was not set into a reset state, if you turn VDD off and on quickly. Therefore the diode exists. When VDD is disconnected, the diode will allow the current to flow quickly to ground but it won’t allow to pass through if VDD is 5V.
This is just trivial for an hardware engineer, and to me it show’s that physics at school is actually useful.
