Product Catalog Hackathon with Ai Recap

The hackathon (or datathon, as we like to call it) data science teams came together for two dynamic days, where teams from different disciplines went on a mission to re-imagine how we discover similar products. This innovative approach involved combining the insights of text (titles/descriptions) and images. To achieve this, they converted both text and image data into vectors and then find similar products by for example using a Faiss index and other.

With all kinds of advanced techniques, they crafted an Assortment Explorer, a transformative tool aimed at revolutionizing the identification of comparable products both within our platform and in comparison, to competitors. This innovative solution opens avenues for a multitude of applications, such as enhancing in-platform discovery, enabling competitive pricing analysis through Amazon product matching, flagging unwanted products, and even identifying duplicate products. For a deeper understanding, feel free to reach out to any of the team members!

Let's dive into how it went.

🎤 Opening Talk - Google's LLMs:
The hackathon kicked off with an exciting opening talk about Language Model Libraries (LLMs) by Google. Murat Eken (Google) dove into the building blocks of these LLMs, highlighting the importance of "Tokens." LLMs calculate the probability of each token following another token based on their vocabulary, but with a twist of random sampling to avoid boring responses. It was a fascinating start, and everyone was eager to get hands-on with the AI magic.

We had two fun days getting to know each other and Google’s Text generation model.

group of people

💡 Day 1 - The AI Exploration:
Teams were quickly familiarized with each other, and the adventure began. They started with a brainstorm, to decide which problem to solve and came up with a game plan. The problem to solve was customers seeing long product descriptions that would bore them. And since large language models are supposed to be good at summarizing content, we saw a perfect match. We divided in sub groups for data prep, prompt engineering and frontend and went at it! 

Participants wasted no time in experimenting with the AI, testing its capabilities, and exploring its potential applications. As the day progressed, some teams amazed everyone by ending the day with already working demos, while others found themselves in a love-hate relationship with the AI's prompts and its hallucinations.

🌟 Day 2 - Fresh Start and Enthusiasm:
With a fresh start on day two, all teams were brimming with enthusiasm to dive back into their projects. The excitement and determination were palpable as they continued their work, refining their ideas and demos.

At the end of the two days, we had a little Pyxle web app called TL;DR which summarises the title, description and the reviews for a product. Not all summaries seemed usable yet but some were actually impressive.

people discussing

😄 Funny Moments:
During the day, amidst all the intense hacking, one team stood out with their humour during an energizer. They got disqualified in the paper-airplane-race by a hilarious attempted to throw a ball of paper with an apple inside of it. The laughter echoed through the room.

🏁 The Grand Finale🥇
As the hackathon reached its climax, all the teams gathered to present their ideas and demos to the jury. The competition was fierce, with a neck-to-neck finish between the top two teams. The suspense was intense, and in the end, the team that had been able to automate titles emerged victorious. The difference between the top two was so close that there was a surprise in store for the runner-up as well. Both teams were granted a golden opportunity to visit the Google office in Amsterdam. Not only that, but they also get the chance to develop their Proof of Concept (PoC) into a Minimum Viable Product (MVP) with the support of Google!

The results? # AI Title Generatation POC

With the current generation of large language models (LLMs), the most successful implementations are designed to work with people rather than replace them. In fact, when implemented correctly, it can actually give the user a more human-like and curated experience with a product.This is the experience our title generation hackathon concept aimed to create. The problem the retailers are commonly faced with is, we give them a paragraph of criteria for their product titles, and then give them a text box and expect them to come up with what they think is the best title. Given this problem, how could we make the experience better? Rather than provide the retailer with more walls of text, we can instead use a LLM to take all we know about good titles, and generate candidate product titles for the retailer to choose or learn from.Given we wanted to implement this, we needed to know where to start, and as we learned at the beginning of the hackathon, this is done by picking out a model (Google's Bison) and then starting on the prompt engineering. What is prompt engineering?

After enumerating these rules, we proceeded to provide 5 high-quality titles from the same "Product Chunk" in the Product Catalog as the product we were trying to generate a title for. Finally, we provideda description of the product. We then submitted this prompt 5 times to hopefully generate 5 possible titles. After generating the titles, we provided the user with the generated candidate titles and allowed them to select which title they liked best.

As for future improvements, the most likely next step is to improve performance. Largely this would just come down to pre-generating the titles in a daily batch process. This would also allow us to apply additional filtering on candidate titles based on an objective scoring of the titles.

people with laptops working