Carbon footprint of interpreted languages

Thought from a good discussion with at François at OSDC today, what is the carbon footprint of various languages? He mentioned that the carbon footprint of a new Haskell compiler release is remarkably non-trivial due to every Haskell package in Debian needing to be rebuilt.

So, I thought, what’s the impact of something like Python? (or Perl). Every machine running the code has to do the bytecode compilation/JIT/interpretation of that code so when, say, Ubuntu ships some new version of $random_desktop_thing_written_in_python, we’re actually compiling it well over 20 million times. That’s a remarkably non-trivial amount of CPU time (and thus CO2 emissions).

So, program in compiled languages such as C or C++ as doing so will save polar bears.

A better set of Boost m4 macros

I just replaced the old Pandora boost m4 macros in a project with boost.m4 from https://github.com/tsuna/boost.m4 and it basically just solved all my problems with Boost and the whole set of distributions that I build for (everything from CentOS/RHEL 5 to Debian unstable).

I like things that other people maintain.

One last bit of evil….

You can store things for later!
drizzle> select libtcc("#include <string.h>\n#include <stdlib.h>\nint foo(char* s) { char *a= malloc(1000); return snprintf(s, 100, \"%p\", a); }") as RESULT;
+-----------+
| RESULT    |
+-----------+
| 0x199c610 |
+-----------+
1 row in set (0 sec)
drizzle> select libtcc("#include <string.h>\n#include <stdlib.h>\nint foo(char* s) { char *a= 0x199c610; strcpy(a, \"Hello World!\"); strcpy(s,\"done\"); return strlen(s); }") as result;
+--------+
| result |
+--------+
| done   |
+--------+
1 row in set (0.01 sec)
drizzle> select libtcc("#include <string.h>\n#include <stdlib.h>\nint foo(char* s) { char *a= 0x199c610; strcpy(s, a); return strlen(s); }") as result;
+--------------+
| result       |
+--------------+
| Hello World! |
+--------------+
1 row in set (0.01 sec)
And then… i can disconnect, reconnect, or whatever (as for any of the above really) before cleaning up my memory:
drizzle> select libtcc("#include <string.h>\n#include <stdlib.h>\nint foo(char* s) { char *a= 0x19a9bc0; free(a); strcpy(s,\"done\"); return strlen(s); }") as result;
+--------+
| result |
+--------+
| done   |
+--------+
1 row in set (0 sec)

A MD5 stored procedure for Drizzle… in C

So, just in case that wasn’t evil enough for you… perhaps you have something you want to know the MD5 checksum of. So, you could just do this:

drizzle> select md5('Hello World!');
+----------------------------------+
| md5('Hello World!')              |
+----------------------------------+
| ed076287532e86365e841e92bfc50d8c |
+----------------------------------+
1 row in set (0 sec)

But that is soooo boring.

Since we have the SSL libs already loaded into Drizzle, and using my very evil libtcc plugin… we could just implement it in C. We can even use malloc!

drizzle> SELECT LIBTCC("#include <string.h>\n#include <stdlib.h>\n#include <openssl/md5.h>\nint foo(char* s) { char *a = malloc(100); MD5_CTX context; unsigned char digest[16]; MD5_Init(&context); strcpy(a,\"Hello World!\"); MD5_Update(&context, a, strlen(a)); MD5_Final(digest, &context); snprintf(s, 33, \"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x\", digest[0], digest[1], digest[2], digest[3],digest[4], digest[5], digest[6], digest[7],digest[8], digest[9], digest[10], digest[11],digest[12], digest[13], digest[14], digest[15]); free(a); return 32; }") AS RESULT;

+----------------------------------+
| RESULT                           |
+----------------------------------+
| ed076287532e86365e841e92bfc50d8c | 
+----------------------------------+
1 row in set (0.01 sec)

Currently the parameter is static in the C version due to me not having… well.. done a good job implementing the calling of C code.

Stored Procedures/Functions for Drizzle

Previously, in “Thoughts on Thoughts on Drizzle” I theorized that one of the major reasons why we did not see lots of people jumping at stored procedures in MySQL was that it wasn’t in their native language (for lack of a better term). We’ve seen External Language Stored Procedures for MySQL that let you write stored procedures in some other languages…. but I felt something was missing.

Firstly, I wanted a language I was really familiar with and comfortable writing complex things in.

Secondly, it should be compiled so that it runs as fast as possible.

Thirdly, it shouldn’t just be linking to a pre-compiled library (drizzle function plugins do that already)

So… the obvious choice was C.

I have a really, really, really early prototype:

drizzle> SELECT LIBTCC("int foo(char* s) { s[0]='4'; s[1]='2'; s[2]=0; return 2; }") AS RESULT;

+--------+
| RESULT |
+--------+
| 42     |
+--------+
1 row in set (0 sec)

or… a bit more sophisticated:

drizzle> SELECT LIBTCC("#include <string.h>\nint foo(char* s) { strcpy(s,\"Hello World!\");; return strlen(s); }") AS RESULT;

+--------------+
| RESULT       |
+--------------+
| Hello World! |
+--------------+
1 row in set (0 sec)

I’m using a function as a bit of a cheat… but the string is passed to libtcc (modified so it’s a shared library so I can load it into drizzle), where it is compiled into native object code (in my case x86-64) and then run.

With the right bits of foo… I could allow calling of all sorts of server functions…. such as those to execute SQL inside the current transaction context.

There are a number of reasons why this is Pure Evil(TM):

  • It executes inside the address space of your database server
    one null pointer dereference and your database server is all gone.
  • It’s arbitrary code injection by design
    Exactly how insane are you? Security–;
  • While great for me and my C hacking friends, possibly not for web app developers, who likely aren’t writing their apps in C every day.
  • See the first reason. Is that not reason enough? Memory protection is a good thing yo.

Anyway, you can see the code up on launchpad in the drizzle-libtcc-function branch. You’ll need to modify your tcc source so that the Makefile snippet for libtcc.o looks like this:

# libtcc generation and test
libtcc.o: $(NATIVE_FILES)
        $(CC) -fPIC -o $@ -c libtcc.c $(NATIVE_TARGET) $(CFLAGS)

libtcc.a: libtcc.o
        $(AR) rcs $@ $^

libtcc.so: libtcc.o
        $(CC) -shared -Wl,-soname,libtcc.so.1 -o $@ libtcc.o

stringstream is completely useless (and why C++ should have a snprintf)

  1. It’s easy to screw up thread safety.
    If you’re trying to format something for output (e.g. leading zeros, only 1 decimal place or whatever… you know, format specifiers in printf) you are setting a property on the stream, not on what you’re converting. So if you have a thread running that sets a format, adds something to the stream, and then unsets the format, you cannot have another thread able to come in and do something to that stream. Look out for thread unsafe cout code.
  2. You cannot use streams for any text that may need to be translated.
    gettext is what everybody uses. You cannot get a page into the manual before it tells you that translators may want to change the order of what you’re printing. This goes directly against stringstream.
  3. You need another reason? Number 2 rules it out for so much handling it’s not funny.