In many systems at bol.com the response speed of our systems is very important. This blog is about the data structures and algorithms we used to make a specific analysis step a lot faster: Finding the longest matching string prefix.Read more
Data Science and Machine Learning are becoming more integrated into current businesses. Especially in e-commerce there is huge potential for predictive modeling. It is therefore no surprise that bol.com has given extra focus on significantly expanding its Data Science efforts the coming year. That’s not to say that there aren’t already some interesting Data Science projects running. In this blog post we will take a look at one of the projects I am currently working on with fellow data scientist Joep Janssen: the chunk project.Read more
Ever since I’ve started working for a WebAnalytics company in 2005 I’ve been working on problems related to making sense of web data. One of the most difficult elements in this type of analysis is making sense of the user agent.
Very often the raw web data I work with is stored in Apache HTTPD access log files that have been compressed using gzip.