Tuesday, 23 July 2013

Algorithms 101

Bright North have started to run sessions on Fridays, the idea behind which is for the team to learn more about what people in other roles are doing. In that context, I gave a brief presentation about algorithms which gave enough information for people to better understand what data scientists are talking about when they are discussing algorithms!

To do so, I decided to explain how decision trees are built and used to make predictions. Decision trees are a good family of algorithms to use to illustrate some general principals because:

  • They are widely used
  • They are based on simple principles
  • And they yield easy to interpret results

The first concept discussed was the difference between regression and classification:

  • In regression, we are trying to predict a numerical value (eg. income)
  • In classification, we are trying to predict to which class an observation belongs (eg. spam/mail)

Decision trees can be used for both regression and classification!

We then discussed the need for a training dataset and a test dataset:

  • The training dataset contains data for all the observed variables including the one we want to predict. We will need it so that our algorithm can “learn” how to make predictions
  • The test dataset also contains data for all the variables, but we hide the data for the variable we want to predict. We will use it to assess how good a job our prediction algorithm is doing

Based on those first elements, we then went to the main topic: how to build a decision tree (induction or learning). This is where we use the training dataset and apply the chosen algorithm to build a model, which we will then use to make predictions. To keep things as simple as possible, we only introduced the main ideas of a generic tree induction algorithm:

  • Starting from a top node, we look for all the possible variables and all the possible split points
  • We estimate the quality of those splits (misclassification error, purity – entropy or Gini)
  • We go for the best possible split and then repeat the operations by adding a new level to the tree

The last point discussed was how to decide when to stop growing a tree and why it is important not to “learn” too much about the training set. This allowed us to briefly discuss the concept of overfitting. The idea is to allow the build of a generic enough model so that it can do a good prediction job on unseen data (that is where we would use our test set!).

We finished by a quick live demo in R using CART to further highlight some of the points discussed above.
Iris Species Classification

Lot of concepts were introduced in half an hour! But in all, the initial objective was met; Bright Sparks not working in the data team will now have an idea of what we are talking about when we discuss “training set”, “learning” or “overfitting”!

Friday, 19 July 2013

Sneak Peek: Bright North Does a Hack Day, part I

It was only a couple of weeks ago that we were sat in our weekly team meeting getting to know our latest Bright Sparks, Herve and Mani, during their round of ‘CV secrets’, when J, our COO, announces that she’d secretly been planning a hack day for us all. We’re all kept in eager anticipation as – despite my best efforts to bribe J – our brief is kept under wraps until the end of the week.

This was the first hack we’d done as a team, so it was exciting to see how we would collaborate in a slightly different way and work together on a project outside of our comfort zone. Sure enough, at the end of the afternoon, we’d made some cool discoveries about team Bright North; how we each think and work together, and exactly why the processes we’re working to help us create great products for our clients. 

When the hack day finally came around, we were split into two teams and given the brief to create something awesome with publicly available data; the non-specific brief allowed us complete flexibility over our projects and method. 

Our team’s approach was to look at the range of public data we could work with, before identifying various problems that we could work towards solving. We then developed a set of core questions that we needed to ask of our data in order to help solve our imaginary client’s data problem before getting on with some guerrilla usability testing. Interestingly, the other team approached the brief from an entirely different perspective and based their ideation around Bright North’s values and our core principle: ‘will it make the boat go faster?’ before identifying their user’s needs.

It’s worth noting that, whilst all this glorious ideation was going on in the background, the senior management team were doing some…thought crystallisation of their own:

The details of either project will remain top secret until the end of part 2, when the products will be up on our website, but there were some interesting outcomes of our hack session; we learned more about some of the framework of principles from which we start our process and how we each come to generate ideas in different ways, with an appreciation of both disciplines of data and user experience.  We also ended the first session with a couple of concepts that we’re very excited to develop outside of our hack sessions. But for that, you’ll have to watch this space!

Tuesday, 2 July 2013

Hakuna matata!!!

I first thought I would say “Eureka! Eureka!” when I was informed that I got the job! It all came to me as a big surprise and things moved rather quickly – and before I knew I was part of the “Bright Sparks” – that’s what we are called here, and we sign off saying “Yours brightly”, nifty hey! Someone told me I was the coolest guy in town to be able to get on board at such a rate!

Oops! Sorry, we got carried away with our conversation and I didn’t introduce myself, I’m Mani and recently joined Bright North as a #Java dev. I think by now I can call myself a Londoner, having lived here for six years already. But I have spent almost 12 years doing software in the EU and loving it – especially London being the melting pot of Europe! London has become and will continue to be the centre for excellence in IT, especially in Java.

In my previous life I worked for a software house that made products and services for the financial services market, and in contrast Bright North is a very young start up – and a forward-thinking, fast moving company. I have been taken away by the company’s commitment and focus on using Java/JVM technologies and producing business intelligence solutions (I think I can dare to say AI / Machine Learning / Data Analytics) – this is also where I will be involved more and more!

There’s been a number of new people who have joined our team since I started and it has been enjoyable knowing them and chatting to them about various topics!

On Herve’s first day I (re)learnt that “Data is program and program is data” – which we think can be quite powerful when used correctly and efficiently, does that sound like Lisp languages or are you thinking of C/C++?

I have recently help create and release two (news)paper to publish community and Java related news. These papers cover two LJC* JUG+ programs that I’m helping and supporting i.e. the Adopt OpenJDK (news)paper and Adopt-a-JSR programs (news)paper – I recommend every Java/JVM developer to take advantage of them.

Now back to Bright North, we are like a small family and we carry on with the theme and keep the atmosphere like this throughout the week. We end the week on the very note – we enjoy what we do!


* London Java Community

+ Java User Group 

Show Me The Data!

Bright North promised me that I was joining quite a dynamic company and week one has not been disappointing!

Jumping into a client’s workshop one hour after first arriving in the office was probably the best way to learn more about the kind of projects on which I would be involved. I have now been introduced to two different projects – and I even found the time to setup my new laptop and attend two welcome lunches!

I am joining Bright North with over 10 years of traditional Business Intelligence experience and after investing a lot of time in training over the last 2 years. I have now gained a solid knowledge of data mining and statistical analysis. Knowing that Data Science is one of Bright North’s core activities is very exciting! That will certainly give me plenty of opportunities to build upon my experience and apply what I recently studied.

I also started to learn more about the Bright North’s Meaningful Data initiative and I am now convinced that my personal development objectives are in line with those of my company. That is a great way to start in a new job!