// A quick overview

When starting a new project in C I tend to focus on getting a config parser functional before I do any real work on the project. Originally, I would dig around online for some header-only library to do this for me, the downside being that these libraries tend to be bloated with features I’ll never make use of. So I’ve devised my own simple key/value config parser. This article is a tutorial on implementing said config parser. You can find my more feature-complete version of this here

This implementation is intentionally hack-ish with the intention of being very simple and short. Towards the end of this article I will go over the issues that this implementation suffers from and what you could do to improve and fix these issues.

First lets take a look at an example config file we’ll be parsing.

number_val 123 
string_val hello_i_am_string
123312 some_file

As you can see, the key value pairs are split with only a space. A key is always a string, but a value is either a string or an integer. Parsing this will be pretty straight forward, and due to the pairs being separated with a space instead of a “=” symbol we can cut some corners when it comes down to checking syntax.

// Loading the file

Before we even think about parsing the config file, we need to load it. We’re taking the header-only approach here as the code required is minimal, hence the frequency of the “static” keyword.

#include <stdio.h>
#include <stdlib.h>

static char* read_file(const char *path, const char *mode)
{
  // load the file in (r)ead mode
  FILE *file;
  file = fopen(path, "r");
  if (file == NULL) {
    printf("could not load file %s\n", path);
    return NULL;
  }

  // get the file length
  fseek(file, 0, SEEK_END);
  size_t size = ftell(file);
  rewind(file);

  // allocate enough space for file data
  char *buff = malloc(size+1);

  // read file contents into buffer
  fread(buff, size, 1, file);

  // null-terminate the buffer
  buff[size] = '\0';

  // close the file
  fclose(file);

  // dont forget to free buff once we are done with it
  return buff;
}

This is a simple function so I wont go over it to much as the comments should be sufficient information.

// Parsing the file contents

Now it’s time to parse the config. To do this we will be using the strtok function to “tokenize” the file contents. strtok mangles the given string, so we’ll create a duplicate of buff and use that with strtok, keeping our buff intact for future use.

// the includes we'll need
#include <inttypes.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>

// duplicate buff's contents
char str[strlen(buff)];
strcpy(str, buff);

We should probably do some basic error checking, as I said previously due to the key value pairs being separated by a space this is pretty simple to do. The easiest (albeit hack-ish) method would be to obtain the number of tokens, and see if they are divisible by two, if they are not then there’s a key with a missing value somewhere.

// get token count
int t_count = 0;
char *token = strtok(str, " \n");
while (token) {
  t_count++;
  token = strtok(NULL, " \n");
}

// should be divisible by two
if (t_count % 2) {
  printf("Syntax error in config file %s\n", path);
  printf("Config key is missing a value\n");
  free(buff);
  return 0;
}

Now we can start parsing the tokens. But wait! We are going to need some data structure to store the key value pairs. For this we’ll use a struct containing a string for the token, an indicator of the values type, and a union to store the actual value.

// our value-type enum
typedef enum {
  conf_type_undefined,
  conf_type_int,
  conf_type_string
} conf_type_e;

// the struct for each key value pair
typedef struct {
  char key[256];
  conf_type_e type;
  union {
    uint32_t i;
    char s[256];
  };
} conf_var_t;

Now we have these, we can load and store our key value pairs. Because we want the ability to have arbitrarily-sized config files, lets create an array of conf_var_t.

// allocate space for key value pairs
int length  = t_count/2;
conf_var_t *vars    = malloc(sizeof(conf_var_t)*length);

Now we’ve got all of that down, we can actually write the parser, for real this time. Last time we used strtok it mangled our buff copy, so lets re-create that copy and find the first token.

// duplicate buff again
strcpy(str, buff);

// find the first token
token = strtok(str, " \t\n");

Essentially what strtok does, is split up the given string by the given delimiters. In this case we pass the delimiters “ \t\n”, which is an empty space, tab-character, or newline. “token” should now point to the first key in the file. We want to automate this as much as possible so we’ll need a few temporary variables.

int i = 0, t = 0;
while (token) {
  ...
}

So, we start with the first token, which is the first key in the config file. As every key should be followed by a value, we can interpret the first token as a key, the following as a value, the following as a key, and so on and so forth. We’ll use the ‘t’ variable to indicate that we are at a key or value, 0 being key, 1 being value. The following code resides in the while loop.

if (!t) {
  // we are at a key, so copy the key string into our array
  strcpy(vars[i].key, token);
} else {
  // we are a value
  ...
}

As you can see the keys where pretty easy to handle, due to them only ever being strings. Next we need some code within the else statement to figure out if our value is a string or integer. The easiest way is to devise a function that determines if our value is a number or not. I wont go into this much as its function should be pretty obvious.

static inline int is_number(const char *str) {
  if (str == NULL || *str == '\0' || isspace(*str))
    return 0;

  char *p;
  strtod(str, &p);
  return *p == '\0';
}

So now we’ve got a way to determine the type of a value, lets go back to that else function and put it to use. In the case of an number value, we convert the string to a integer, and set the type to conf_type_int. In the case of a string, we copy the string into the value and set the type to conf_type_string. After which we increase i, which is the array index. And so ends the else-statement.

else {
  if (is_number(token)) {
    vars[i].i = atoi(token);
    vars[i].type = conf_type_int;
  } else {
    strcpy(conf->vars[i].s, token);
    vars[i].type = conf_type_string;
  }

  i++;
}

The last bit of code for the while loop is small, but its also the most important. We switch the t variable, so that the next loop iteration knows if its dealing with a key or value. And we get the next token. The strtok call takes the same delimiter as before, but this time we pass NULL, instead of a string. The reason for this is that strtok keeps track of what its working with between strtok calls, so we pass NULL to let strtok know we want to continue tokenizing the same string.

while {
  if (is_number(token)) {
    ...
  } else {
    ...
  }

  // switch between a key and a value
  t = !t;

  // get the next token in str
  token = strtok(NULL, " \t\n");
}

And so ends the parser. The only thing left to do now would be to write some small getter functions that take a key and return the value at said key in the array. You’ll want one for both string and integer values. I’ll write the integer one and let you figure out the rest on your own as the code is almost identical. Also remember to free the vars array after you are done with it!

int conf_get_int(conf_var_t *vars, const char *key, int length)
{
  // iterate over vars and find a key that matches
  conf_var_t *v = NULL;
  for (int i=0; i<length; i++) {
    if (strcmp(vars[i].key, key) == 0)
      v = &vars[i];
  }

  // if we find a key, return its value
  if (v != NULL && v->type == conf_type_int)
    return v->i;

  // no key found
  return 0;
}

 
// Final thoughts

This is a pretty quick and easy implementation of a very simple key value config parser. As I noted at the start there exist some issues with it, specifically the error checking. Currently we check to see if the amount of tokens is divisible by two, and in the case that its not, an error is thrown and it doesn’t parse the config. Ideally the parser should be able to discard any keys that have no value and continue parsing the rest of the config, but I’ll leave that for you to implement should you find it necessary.