Friday, 3 May 2019

Log driven development

Everyone knows attempting to figure out what's happening by resorting to print statements is desperation. Everyone knows you should use TDD, BDD or some xDD to write code well, don't they? I am desperate. I suggest PDD, print driven development is slow, error prone but sometimes helpful.

I recently shoe-horned a new feature into a large existing code base. It's hard to make it run a small set of work and I wanted to check a config change create my new class via the dependency injection framework. Now, I could put in a break point and wait a while for it to get hit, or not. Fortuenately, the code writes logs. By adding a log line to the constructor, I could tell much more quickly that I'd got the config correct. Write the code, leave it running, have a look later. I am tempted to name this Log Driven Development.

On one hand, adding the moral equivalents of prints is a bad thing. Teedy Deigh mused on logging in Overload 126, saying

It is generally arbitrary and unsustainable and few people are ever clear what the exact requirements are, so it spreads like a cat meme.

And yet, if I add informative logging as things happen, I can see how the whole system hangs together. Jonathan Boccara mentioned finding logs when trying to understand an unfamiliar code base, in his talk at the ACCU Conference. Existing logs can help you understand some code. Looking at this from a different direction begs the question are my logs useful? As a first step seeing a log entry when I wired in my class felt like a win. It told me what I wanted to know. I must make sure the logs I write out as my class starts doing something useful are themselves useful.

This is a step in the direction of a walking skeleton style approach - get things running end to end, outside-in, then fill in the details. I'm much more comfortable with a TDD approach, but keeping an eye on the whole system and how it hangs together is important. Building up the new feature by writing useful logs means the logs will be useful when it's released. LDD - log driven development. Why not?















Wednesday, 17 April 2019

ACCUConf 2019

The ACCU conference happened in Bristol again this year. For my first time ever, I was at a workshop. In fact, I ran a work shop with Chris Simons. We talked about Evolutionary Algorithms in practice. We gave a 90 minute talk later in the week, using the same Java framework (JCLEC), with slides here. High level summary: can your computer solve problems using a few random guesses and iteratively improve, using crossover (merging together previous attempts) and mutation (nudge things up or down, or flip bits). Answer yes: we managed to turn on all the bits in an array, code our way out of a paper bag (by firing virtual cannon balls based on an old Overload article), and finally in the 90 minute workshop, we generated code for Fizz Buzz.   

The full schedule is here. Having five tracks meant I missed lots, but talks will appear on the YouTube channel over time. I'll just give brief notes on a handful of talks I attended. The opening keynote was "Delivering software that is secure and usable - who’s job is it?" by  M Angela Sasse. Angela called out StackOverflow being functionally great but the security advice being bad, in contrast to using an official manual, wherein the security advice is great but it's functionally worse. This was based on measuring several developers attempting to use a software product. How can you actually measure security or usability? How are you currently measuring it? Mention was made of hard to follow security rules, which people tend to work around. Angela called for a way to reprogram the security experts. How good are they at conflict resolution? Do they have social marketing skills? Twitter devolved into quips about social engineering at that point.My final note says,
Programmers are tribal and seek approval. Try to trust and collaborate instead.

Next I'll mention "10 Techniques to Understand Code You Don’t Know" by Jonathan Boccara. He's written a book, which I've seen several people recommend. The 10 techniques fell into three groups: explore, speed read, and detail.
Exploring covered

  • using and finding the I/O frameworks, 
  • performing local analysis - getting the hang of one or two important functions, 
  • analysing call stacks to join the dots between modules. 

Speed reading covered

  • reading the end first - don't be put off by a long function, find the output or returns and worry about the rest later,
  • find frequent words, both count and span (total and lines with words)
  • filter on flow control - giving something like a table of contents for a book
  • scan for the main action - feel free to ignore catch blocks or elses, focus on the happy path

Finally, you sometimes need to start going into detail

  • try scratch refactoring
  • practice writing functions in the code
  • team up - strive for pair understanding

There was a discussion about flame graphs at the end, and he mentioned "How to read a book: the classic guide to intelligent reading" by Alder and van Doren. This points out you don't need to read a non-fiction book in order. Jump around, follow back links, jump straight in to what you want to learn. Very non-linear.

Next, I'll talk about "The anatomy of an exploit" by Patricia Aas. She started by mentioning the weird machine. You can see most programs as a finite state machine. An exploit jumps out of the finite states into other, unintended states. She looked at CWE-242; a list of potentially dangerous functions. The CWE is the common weakest enumeration, available online, listing things to avoid. Her talked pulled on things that might go wrong with gets or std::cin. Surprisingly, you get more warnings from C than C++. By disabling warnings, one at a time, we ended up with code to get a prompt. Once you have a shell on another machine, you can then do a variety of nefarious things.  She covered loads of things including ASLR; address randomisation, heap grooming and use after free. Security was a definite theme at this conference, and many developers understand far too little about it.

Herb Sutter gave Thursday's keynote on "De-fragmenting C++: Making exceptions more affordable and usable". He called out a divide between teams who can and who cannot use exceptions. Many libraries have a mix of exceptions and return code. He said "Pick a lane". C++ is supposed to be zero overhead

  • a feature only costs if it's used
  • it's better than coding it yourself

This is not true for exceptions. He considered the difference between program recoverable and non-recoverable errors. What can you do about stack exhaustion, for example? Who do you report problems to? Humans or code? Exceptions are automatic (a good thing) and invisible (a bad thing). He sketched out ways we could make exceptions have zero overhead. What this space.

Anthony Williams then talked about aysnc, executors and callbacks: "Here’s my number; call me, maybe. Callbacks in a multithreaded world". He called out a few things to be aware of. Does the order of your callbacks matter? Can you deregister them? He encouraged us to capture by value, rather than reference, unless we have a really good excuse.

At lunchtime, there was a book signing. I sold several copies of my book; "Genetic Algorithms and machine learning for programmers." Three others,  Anthony Williams, Ivan Cukic, and Jonathan Boccarra were also selling books, but I didn't get a chance to go talk to them. Thanks to ACCU for the chance to do this. I put mine in paper bags, and even wrote a receipt on one. The chapters in mine show how to code your way out of a paper bag, so it seemed sensible.



I gave a session with Chris Simons about how to teach your computer to code Fizz Buzz. We plan to write this up for ACCU's Overload magazine shortly.

On Friday, Paul Grenyer gave the keynote. He reminisced about people he'd met when he was an ACCU member, and all the things he'd done, some that worked and some that didn't, in Norwich, to grow the tech network. There's now a background discussion on accu-general and accu-members email lists about how to revive some things we used to do, and find new things to do, that will be valuable to the group. I'd love to see the mentored developers reboot.

Next, I went to "Interactive C++ : Meet Jupyter / Cling - The data scientist’s geeky younger sibling" by Neil Horlock. He talked about Code Club, and teaching people. This led nicely into using Cling/Jupyter to have notebooks for C++. Cling is an interactive (JITted) version of clang. I can't do his talk justice here. It was amazing. It managed to cope with templates, and a variety of things that blew my mind. He demonstrated using RISE to make RevealJS slides from a notebook, so I think I was watching a talk in a talk in a talk.

My notes have run out at this point. At the speakers' dinner we met EchoBorg. An actor (an echoborg) voiced the words of a chatbot. People volunteered to be interviewed to become an echoborg themselves. This set of cyberpunk style SciFi in my head. Again, I won't do it justice, but watching the conversation develop was incredibly interesting. Have a look at their websites:



I went to two talks on Saturday: "Windows Native API" by Roger Orr and
"Best practices when accessing Big Data or any other data!" by  Rosemary Francis. I was too tired to make notes by that point, and we left early, since we had a three hour drive home. Rog considered several ways to return 42 from a program and showed several steps that happen before main; something people don't always consider. He touched on security too. I didn't stay to the end of Rosemary's, but she was talking about tooling her company has developed to watch programs using big data and tracing bottlenecks. In my opinion, many data scientists make mistakes some programmers might avoid. Her first example was opening and closing a file in a loop. I wish machine learners and programmers could talk to each other more and help each other out.

I had a great conference, and the includecpp crew were there. I dipped in and out of the discord chat. It's lovely to see people supporting each other. Simple things, like chats about where to go for dinner.

Echoborg has left dystopian Sci-Fi short stories brewing in my head, and the Jupyter/cling talk left me with lots to explore. Thanks ACCUConf. Hope to be there next year.



Tuesday, 12 March 2019

The ACCU's Overload magazine

ACCU is an organisation for programmers. Its original focus was C and C++, but now members use a variety of languages, talk about testing and process and how to keep learning. ACCU holds an annual conference in the UK, attended by people from around the world. There's even a YouTube channel of recorded talks from this.

As a member you get a discount for the conference, can volunteer to do book reviews, can participate in study groups, though these have been quiet lately, and get two magazines; the CVu members' magazine and Overload, which is open to anyone. There are also several local groups if you want to come along and meet us.

I've been a member for several years now. It's been a great networking opportunity and I have learn so much from other members. I love the magazines, and by starting to write for them myself, I stepped up my game. I began by writing book reviews, then tried some of the Student Code Critiques in CVu. Eventually, I wrote an Overload article, pulling together a discussion on the accu-general mailing list about floating point numbers.

I took on the role of Overload editor in 2012. We welcome articles from non-members as well as members. They are peer reviewed, meaning the author gets feedback, questions and suggestions. A surprisingly high number of writers have gone on to write books, myself included. (I mentioned I wrote a book about genetic algorithms and machine learning, yes?)

If you have an article you'd like to get published, let me know. We do accept existing blog posts, but the review team might well ask for slight improvements. There are some submission instructions here.

We welcome established writers as well as new writers. If you've never written an article, give it a go. You can learn a lot by trying to write something up. For example, as you try to explain something you may find gaps in your knowledge and understanding. Questions and suggestions from the review team will make your article better.

I love the ACCU and am looking forward to this year's conference, in just under a month. If you can't make the conference, find a local group, or consider joining the organisation. Or submit an article for Overload. Get involved.


ACCU Home page

Tuesday, 26 February 2019

Code your way out of a paper bag

I attended nor(DEV):con, a tech conference in Norwich, last week. I gave a 45 minute talk which I called "Code your way out of a paper bag". A majority of my recent talks involve getting out of, and once into, a paper bag. I've used this as a vehicle to demonstrate some machine learning, AI and other algorithms.

I interviewed a candidate for a role a while ago, and the other interviewer commented the interviewee couldn't code their way out of a paper bag afterwards. Jeff Atwood, the co-founder of Stack Overflow, has made similar comments:

We're tired of talking to candidates who can't program their way out of a paper bag.

It's a shorthand way for saying people can't do something simple. First, programming isn't always simple. Second, how do you prove you can code your way out of a paper bag?

Use Genetic Algorithms

You could do something straightforward, like make a line zoom out of a paper bag, provided you have a way to draw things.



Alternatively, you could do something more complicated. I used genetic algorithms. If you fire a cannon ball, choosing the angle and initial velocity upfront, it may, or may not end up outside a paper bag, provided you use a tiny, virtual cannon, placed at the bottom of a paper bag. The cannon balls might just fire through the bag, but perhaps you can get them to go over the side if you choose the right numbers. There's a free excerpt from my book here talking this through.

Now, you could guess a few (angle, velocity) pairs and see what happens.  I got three people in the audience to do this. You could then swap the angle and velocity of the better tries, and maybe change one or both of the numbers slightly. This covers the essence of how genetic algorithms work. The swap is called crossover, drawing on the idea of chromosomes recombining when living organisms breed. The slight tweak is called mutation, and draws on the idea of random fluctuations in DNA . Darwin's evolution says these random changes give rise to new species. The fitter ones survive. All you then need is a function to decide how fit a solution is, often called a fitness function. Start with some random pairs, select some fitter pairs, use crossover and mutation for a while. In the end, the attempts might get better.

Over-engineered?


You could argue my approach is somewhat over-engineered. You'd be right. However, it's a nice self-contained example to demonstrate gentic algorithms. It was filmed, so will be up online soon. I'm not sure how clear the audience coming out with their attempts will be on film. Hopefully enough for you to get the idea.

How do you   out of a paper bag? Tell twitter, or tell me.

Have a look at my book too.



Wednesday, 6 February 2019

CppOnSea

CppOnSea 2019

Phil Nash organised a new conference, CppOnSea, this year. I was lucky enough to be accepted to speak, so attended to two conference days, but not the workshops.


There were three tracks, along with a beginners track, run by Tristan Brindle, who organises the C++ London Uni, which is a great resource for people who want to learn C++. I'll do a brief write-up of the talks I attended.


The opening keynote was by Kate Gregory, called "Oh The Humanity!" She made us think about the words we use. For example, Foo and Bar trace back to military usage, hinting at people putting their lives on the line. Perhaps we need better names for our variable and functions. We are not fighting a war. What about one letter variable names? 'k. Nuff said. What about errorMessage? If you call the helpMessage instead, how does that affect your thinking?
Kate was also talking about trying to keep the code base friendly, to increase confidence:



I went to see Kevlin Henney next. I have no idea how to summarise this. He covered so many thing. What does structured programming really mean? By looking back to various uses and abuses of goto, By highlighting the structure in various code snippets, he was emphasising some styles make the structure and intent easier to see.

I saw Andreas Fertig next. Inspired by Matthew CompilerExplorer's Godbolt, he has created https://cppinsights.io/. Try it out. It unwraps some of the syntactic sugar, so you can see what the compiler has created for, say, a range based for loop. This can remind you where you might be creating temporaries or have references instead of copies or vice versa, without dropping down into assembly. Do you know the full horror of what might be going on inside a Singleton?

This led to an aside about statics and the double checked locking pattern. His headline point was the spirit of C++ is "you pay only for what you use", so be clear about what you are using. The point isn't that the new language features are expensive. They are often cheaper than old skool ways of going things. Just try to be clear about what's going on under the hood. Play with the insights tool.

Next up, I saw Barney Dellar, talking about strong types in C++. His slides are probably clear enough by themselves, since they have thorough speaker notes. My main note to myself says "MIB: mishap investigation board", which amused me. He was talking about the trouble that can happen if you have doubles, or similar, for all your types, like mass and force:

double CalculateForce(double mass);

It's really easy to use the wrong units or send things in in the wrong order. By creating different types, known as strong types, you can get the compiler to stop you making mistakes. Use a template with tags, you can write clear code avoiding these mistakes.

Next up was a plenary talk by Patricia Aas on Deconstructing Privilege. She's given the talk before, so you'll be able to find the slides or previous versions on YouTube. Her take is that privilege is about things that haven't happened to you. Many people get defensive if you say they have been privileged,  but this way of framing the issue gives a great perspective. Loads of people turned up and listened. Maybe surprising for a serious geek C++ conference, but the presence of https://www.includecpp.org/ ensured there were many like minded people around. If you are privileged, listen and try to help. And be careful asking intrusive questions if you meet someone different to you.

After quite a heavy, but great talk, I was "in charge" of the lightning talks. Eleven people got slots. More volunteered, but there wasn't time for every one:

Simon Brand; C++ Catastrophes: A Poem.
Odin Holmes; volatile none of the things
Paul Williams; std::pmr
Heiko Frederik Bloch; the finer points of parameter packs
Barney Dellar;imposter syndrome or mob programming
Matt Godbolt; "words of power"
Kevlin Henney; list
Neils Dekker; noexcept considered harmful???
Patricia Aas; C++ is like JavaScript
Louise Brown; The Research Software Engineer - A Growing Career Path in Academia
Denis Yaroshevskiy; A good usecase for if constexpr


My heartfelt thanks to Jim from http://digital-medium.co.uk and Kevlin "obi wan kenobi" Henney for helping me switch between powerpoint, power point in presenter mode and the pdfs, and getting them to show on the main screen and my laptop. No body knows what was happening with the screen on the stage for the speaker. If you ever attend a conference, do volunteer to give a lightning talk. Sorry to the people we didn't have time for.

Day one done. Day two begun.First up, for me, after missing my own talk pitch, was Nico Josuttis. Don't forget his leanpub C++17 book. It's still growing. He talked about a variety of C++17 features. The standout point for me was the mess you can get into with initialisation. He's using {} everywhere, near enough. Like

for (int i{0}; i<n; ++i)
{
}

Adding an equals can end up doing horrible things.

Much as I wanted to go see Simon Brand, Vittorio Romeo and Hana Dusikova (with slide progression by Matt Godbolt) next, I had a talk to do myself. I managed to diffuse my way out of a paper bag, while reminding us why C's rand is terrible, how useful property based testing can be, using some very simple mathematics: adding up and multiplying. This was based on a chapter of my book, and you can download the source code from that page if you want, even if you don't buy the book. I used the SFML to draw the diffusing green blobs. Sorry for not putting up a list of resources near the end.

I attempted to go to Guy Davidson's Linear algebra talk next, but the room was packed and I was a bit late. I heard great things about this. In particular, how important it is to design a good interface if you are making libraries.

My final choice was Clare Macrae's "Quickly testing legacy code". This was my unexpected hidden gem of the conference. She talked about approval testing. This compares a generated file to a gold standard file  and bolts straight into googletest or Catch. It's available for several other languages. It generates the file on your first run, allowing you to get almost anything, provided it writes out a file you can compare, under test. Which then means you can start writing unit tests, if you need to change the code a bit. Changing legacy code to get it under test, without a safety hardness is dangerous. This keeps you safer. Her world involves Qt and chemical molecules visualisations. These can be saved as pngs, so she can check she hasn't broken anything. She showed how you can bolt in custom comparators, so it doesn't complain about different generated dates and does a closer than the human eye could notice RGB difference. Her code samples from the talk are here.  I've not seen this as a formal framework before. Her slides were really clear and she explained what she was up to step by step. Subsequently twitter has been talking about this a fair bit, including adding support for Python3.

Matt "Compiler Explorer" Godbolt gave the closing keynote. Apparently, the first person Phil Nash has met who;s first conference talk was a keynote. He also had some AV issues:


If you've not encountered the compiler explorer before, try it out. You can chose which compiler you want to point your C++ code at and see what it generates. You need a little knowledge of the "poetry" it generates. More lines doesn't mean slower code. His tl;dr; message was many people spread rumours about what's slow, for example virtual functions. Look and see what your compiler actually does, rather than stating things that were true years ago. Speculative de-virtualization is a thing. Your compiler might decide you only really have one likely virtual function you'll call so checks the address and does not then have the "overhead" it used to years ago. He also demonstrated what happened to various bit counting algos - most got immediately squashed down to one instruction, no matter how clever they looked. How many times have you been asked to count bits at interview. Spin up Godbolt and explore.This really shows you need to keep up to date with your knowledge. Something that was true ten years ago may not longer hold with new compiler versions. Measure, explore, think.

There was a lovely supportive atmosphere and a variety of speakers. People were brave enough to ask questions, and only a few people were showing off they though they new something the speaker didn't.

I'll try to back fill links to slides as I get them. Thanks to Phil for arranging this.

Did I mention I wrote a book?


Saturday, 26 January 2019

xkcd-style plots in MatPlotLib

Most programmers I know are familiar with xkcd, the webcomic of romance, sarcasm, math, and language. In order to create diagrams for my machine learning book, I wanted a way to create something I could have fun with.

I discovered that Python's MatPlotLib library has an xkcd style, which you simply wrap round a plot. This allowed me to piece together what I needed using line segments, shapes, and labels.

Given a function, f, which draws what you need on some axes ax, use the style like this, and you're done:

with plt.xkcd():
    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    f(ax)

    plt.show()

For example to explain what happens when you fire cannons at different angles:


I gave a talk at Skillsmatter, called "Visualisation FTW", which was recorded, so you can watch it if you want. I also wrote this up for ACCU's CVu magazine. The ACCU runs an annual best article survey, and I was runner up, which was a pleasant surprise. You need to be a member to view the article, but it covers the same ground as the short talk at Skillsmatter.

Buy a copy of my book, or go play with xkcd style pictures. Have fun; I did.



Monday, 31 December 2018

Does machine learning really involve data?

Many definitions of machine learning start by proclaiming it uses data, to learn. I want to challenge this, or remind us where the term originally came from and consider why the meaning has shifted.

For a long time machine learning seemed to be a new technology, but I notice we're starting to say AI and machine learning interchangeably. Job postings often sneak the word scientist in there too. What is a data scientist? What do any of these words mean?

Current trends often come with an air of mystery. I suspect a lot of data science roles involve data entry, in order to clean input data. Not as appealing as the headline role suggests. Several day to day techniques being described as machine learning  could also be described as statistics. In fact, look at the table of contents of a statistics book, such as An Introduction to Statistical Learning. Look at a small selection of the topics:

  • accuracy
  • k-means clustering
  • making predictions
  • cross-validation
  • support vector machines, SVM
  • principal component analysis, PCA


Most, if not all, of these topics are covered in an average machine learning course and included in ML software packages. Yet statistics doesn't sound as exciting as machine learning, to many people.

Wikipedia defines statistics as "a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation." No mention of learning, though each of these activities form an essential part of data science. The article goes on to discuss descriptive and inferential statistics. Inference involves making predictions: many people use the term machine learning to mean the very same. Can you spot patterns in purchases automatically and suggest other items a customer might be interested in? Can you detect unusual or anomalous behaviour, indicating fraud or similar? Again, these are now labelled as AI or machine learning, but usually rely on well established statistical techniques. Admittedly, today's faster machines mean number crunching can happen quickly. This has contributed to the resurgence of machine learning.

Many problem solving algorithms are not about numbers. Some techniques, such as evolutionary computing, including genetic algorithms, don't fit comfortably into a data-driven view of learning. Do these methods count as machine learning? I'll leave that for you to think about. My book explores genetic algorithms and several other areas that do not need numbers to learn.

Arthur Samuel came out with the phrase "machine learning", by which he meant something along the lines of a "field of study that gives computers the ability to learn without being explicitly programmed." The abstract of his 1959 paper, "Some studies in machine learning using the game of checkers" states,

Two machine-learning procedures have been investigated in some detail using the game of checkers. Enough work has been done to verify the fact that a computer can be programmed so that it will learn to play a better game of checkers than can be played by the person who wrote the program. Furthermore, it can learn to do this in a remarkably short period of time (8 or 10 hours of machine-playing time) when given only the rules of the game, a sense of direction, and a redundant and incomplete list of parameters which are thought to have something to do with the game, but whose correct signs and relative weights are unknown and unspecified. The principles of machine learning verified by these experiments are, of course, applicable to many other situations.

AI and machine learning are both very old terms. I think they encompass a much broader field than data analysis. As a final thought, Turing designed an algorithm to play chess. In effect, he was trying to make an artificial brain, before the term AI was invented or computers, in their modern sense, existed.

I think machine learning is much broader than investigating data. Its history involves attempting to get computers to learn, and specifically to learn to play games.Let the games continue.


Read my book and see what you think.