OpenCL on Kaveri

Kaveri APUs from AMD are the first APUs with hUMA support. This is a big step for OpenCL development. We can now read and write directly from the GPU to global RAM. Copying huge amount of memory from RAM to GPU memory and back is now needless. I want to give a short overview of the characteristics of OpenCL programming with Kaveri and its performance.

Overview

By default your Kernel is compiled with 32 bit address width. You should set the environment variable GPU_FORCE_64BIT_PTR to 1 to access the complete RAM. The GPU device of my Kaveri (A10-7850k) has the following specifications:

Device Name: Spectre (AMD Accelerated Parallel Processing, OpenCL 1.2 AMD-APP (1445.5))
Address Bits: 64
Little Endian: true
Global Memory Size: 512 mb
Base Address Alignment Bits: 2048
Global Memory Cache Size: 16 kb
Local Memory Size: 32 kb
Clock Frequency: 720 MHz
Compute Units: 8
Constant Buffer Size: 64 kb
Max Workgroup Size: 256

Since the mentioned GPU has 512 processing units, we get a wave front size of 64 which is typical for AMD. The global memory size is a bit confusing. It pretends that we can only access 512 MB global memory, which is not true.

Continue reading

C++ Web Toolkit Wt (witty)

During the last days I made a simple web application, which serves you chess puzzles to solve. You can see it here bestmove.abrok.eu.
Before starting this little project I decidet to use Wt as a framework to learn something new. Also it seemed to be very easy to use and had similarities to Qt which I am already familiar with.
I want to share my impressions after my first little project with it. I will start with my negativ impressions:

  • The ownership of objects does not feel clear at the beginning. Although there are rules for ownership, there are special cases which may not be intuitive at the beginning.
  • Even for my simple application there were some Wt related bugs that cost me some time.
  • Because Wt is a serverside framework, your UI could feel a bit slow. Wt solves this for some standard usecases. For example a menu widget can preload all its child widgets and switch its content client side. If you want more special things, e.g. change the border color of a widget when it is clicked, you either accept the latency or you have to write javascript code for this. The good thing is Wt offers good ways to integrate javascript.
  • You can choose the layout between two predefined styles and a “bootstrap” theme (in version 1, 2 and 3). In my opinion there could be more predefined themes, although I did not realy miss them.

Of course there are also many good things to mention:

  • Wt comes with many basic examples you can learn from.
  • It is very active developed. All bugs I reported got fixed within one week.
  • All communication between client and server is handled by Wt. This makes web development feel like developing for a desktop.
  • You dont have to care what browser the client is using. Wt handles this.

If I would use Wt again for my next project depends on the complexity of the UI and the communication with the backend. If the UI is extremly complex I would probably choose another framework like GWT, because to make the UI fast you need to have client side code. If the UI is not to complex, and the focus lies on the backend, I would definitely use Wt again.

Json Spirit not thread safe on Ubuntu

A few days ago I noticed a lot of fast crashes of my program after adding multithreading to it. GDB showed the segfault in json_spirit::read(). So I wondered if that is a thread safety issue. I could not find any information about whether the json spirit package for ubuntu is thread safe or not. A simple test reproduces this behaviour:

#include <json_spirit.h>
#include <thread>
#include <vector>
#include <mutex>
 
void test()
{
	json_spirit::mValue v;
	for(int i = 0; i < 1000; ++i)
		json_spirit::read("{}", v);	
}
int main()
{
	std::vector<std::thread> threads;
	for(int i = 0; i < 8; ++i)
		threads.emplace_back(test);
	for (auto& th : threads)
		th.join();
}
clang++ test.cxx -pthread -ljson_spirit -std=c++11 -O0 -g3

The test program crashes immediately at least on my system (Ubuntu 14.04, libjson-spirit-dev 4.05-1.1). The Backtrace looks like this:

#0  0x00007ffff0000960 in ?? ()
#1  0x000000000042e0de in __gnu_cxx::__normal_iterator<char const*, std::string> json_spirit::read_range_or_throw<__gnu_cxx::__normal_iterator<char const*, std::string>, json_spirit::Value_impl<json_spirit::Config_map<std::string> > >(__gnu_cxx::__normal_iterator<char const*, std::string>, __gnu_cxx::__normal_iterator<char const*, std::string>, json_spirit::Value_impl<json_spirit::Config_map<std::string> >&) ()
#2  0x000000000042e1bc in bool json_spirit::read_range<__gnu_cxx::__normal_iterator<char const*, std::string>, json_spirit::Value_impl<json_spirit::Config_map<std::string> > >(__gnu_cxx::__normal_iterator<char const*, std::string>&, __gnu_cxx::__normal_iterator<char const*, std::string>, json_spirit::Value_impl<json_spirit::Config_map<std::string> >&) ()
#3  0x00000000004080fd in json_spirit::read(std::string const&, json_spirit::Value_impl<json_spirit::Config_map<std::string> >&) ()
#4  0x00000000004028fb in test () at test.cxx:10
#5  0x0000000000404f3f in std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (this=0x6f3060) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:1731
#6  0x0000000000404f15 in std::_Bind_simple<void (*())()>::operator()() (this=0x6f3060) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/functional:1720
#7  0x0000000000404eec in std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (this=0x6f3048) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/thread:115
#8  0x00007ffff7b87bf0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff73a4182 in start_thread (arg=0x7ffff6fd5700) at pthread_create.c:312
#10 0x00007ffff70d130d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Adding a mutexed thread safe wrapper function which calls json_spirit::read() fixed the problem:

#include <json_spirit.h>
#include <thread>
#include <vector>
#include <mutex>
 
bool js_read(const std::string& js, json_spirit::mValue& v)
{
	static std::mutex mtx;
	std::lock_guard<std::mutex> lock(mtx);
	return json_spirit::read(js, v);
}
void test()
{
	json_spirit::mValue v;
	for(int i = 0; i < 1000; ++i)
		js_read("{}", v);	
}
int main()
{
	std::vector<std::thread> threads;
	for(int i = 0; i < 8; ++i)
		threads.emplace_back(test);
	for (auto& th : threads)
		th.join();
}

Note that the initialization of the static local mutex is only thread safe since C++11. Alternativly you have to use a global mutex.