Getting Started with Generative AI on AWS
Introduction
In this guide, we’ve captured our experience - the musings of our CMO and one of our engineers - as they go from utter novices to passable intermediates in 30 hours!
Following on from AWS Summit London, where over 20,000 technology and business professionals gathered to hear from partners, customers, technology leaders and from AWS, we found one item dominating the agenda: Generative AI. This month, we’ve really leaned into the topic, done our own internal research and shared perspectives with the community.
We had an online event with 450 people signed up to learn about practical use cases for generative AI on AWS, run a Live Lab to demo some of the capabilities on AWS, led an Executive Roundtable to show leaders how to approach value creation from AI and we’ve even gone a bit left field to create the “Cutting Through Complexity” event - which is basically 3 hours of axe throwing, mini-hackathons and a bit of AWS shoulder rubbing.
Amazon Q - ChatGPT Alternative?
Amazon Q is an AI-powered chatbot designed to assist with AWS inquiries using Generative AI technology. Amazon Q is incorporated into several services and can be accessed at no cost with an AWS account. To get started, log into your AWS account, locate and select the Amazon Q icon, initiate a chat session and begin making inquiries.
Let’s say you have a question about AWS billing. Amazon Q answers this question as it has been trained on AWS documentation and cites the source of this knowledge. The likelihood of experiencing hallucinations is minimal due to the authoritative nature of the training data. In instances where the Amazon Q is unable to provide an answer, it will explicitly inform you.
Amazon Q Business offers a business layer that is accessible in regions within the United States. It is beneficial for analysing workflows and offering feedback, specifically for understanding the cause of a particular problem. This paid version is currently only available in the US and is not yet accessible in other regions. It would be wise to monitor its availability as it comes to other regions like Europe or APAC.
Amazon Q Developer has taken the place of the Code Whisperer. It serves as a comparable alternative to GitHub Copilot. It is beneficial to seek assistance in comprehending your code, inquiring about its functionality, requesting an explanation of the code, or seeking aid in writing a unit test. It’s often positioned as a developer’s ‘pair programmer’ in an IDE.
Amazon Q for Quicksight facilitates the visualisation of data. This is most useful for business analysts or managers looking to visualise a report or dataset in a Quicksight Dashboard, but want to try ‘chatting’ with the data. This is useful for turning natural language queries into dynamically generated charts to show insights.
Amazon Bedrock - The Foundations
While Amazon Q assists with troubleshooting, Amazon Bedrock enables the development of applications that utilise AI models and seamlessly integrate them into your infrastructure. This is a great starting point for those wishing to build an AI-driven product using Foundation Models (like Claude 3) that can be fine tuned to support a particular use case.
AWS Bedrock is a service that is fully managed, meaning it is serverless and does not require any concerns about infrastructure. You have the option to select a foundation model from a list of different models relevant to a particular task. For example, choosing Claude 3 for something that requires greater reasoning or Titan for speed and factual recall. It is possible to customise the model and integrate it with applications.
Both the data in Amazon Bedrock and Amazon Q adhere to the principles of "responsible AI” - meaning that your data is not utilised for training the model. Bedrock allows the incorporation of Retrieval Augmented Generation (RAD) into a Foundation Model to facilitate the initiation of a search from an external data source that is not included in the model. For instance, searching for weather forecasts and searching for information about clothing.
Foundation Models - What To Choose
There is a crucial point AWS wants to make: there will not be a singular generative AI model that will dominate all others. We agree (for now).
Every model possesses unique advantages and disadvantages when it comes to specific types of tasks. Some foundation models are great for translation, while other models for image generation or reasoning. You can see a quick overview below to inform your model choice. The costs vary for each model, which is a crucial factor to consider based on the intended use of the foundation model. You are charged based on the usage of tokens, with costs applied accordingly. There are important considerations in your choice, such as the tradeoffs between speed and accuracy. Our key piece of advice here is to think about the tasks within your use case and test the models for your particular application tasks.
Amazon Titan |
A family of FMs for text and image generation, summarization, classification, open-ended Q&A, information extraction, and text or image search. |
Claude |
FM for thoughtful dialogue, content creation, complex reasoning, creative writing, and coding, trained with Constitutional AI. |
Command and Embed |
Text-generation and representation models to generate text, summarise, search, cluster, classify, and use RAG |
Jurassic |
Instruction-following FMs built for the enterprise that perform a range of tasks including text generation, question answering, summarization, and more. |
Llama |
Fine-tuned models ideal for dialogue use cases and natural language tasks like Q&A and reading comprehension. |
Mistral AI |
Powerful models with publicly available weights supporting a variety of use cases from text summarization, text classification, and text completion to code generation and code completion. |
Stable Diffusion |
Image-generation model that produces unique, realistic, and high-quality visuals, art, logos, and designs. |
Fine Tuning Parameters
You have the option to select both the foundation model and the parameters in Amazon Bedrock. The parameters to choose from include things like Temperature, Top Prediction, Response Length and Stop Sequences.
If you want to enhance ‘creativity’, you can turn up the Temperature. To enhance accuracy and ensure that only reliable answers are provided, you can turn down Top P. Most importantly, there will be some trial and error as you experiment with fine tuning. Ultimately, you need to configure a Foundation Model in a way that best suits its purpose.
Fun With Prompt Engineering
The better the prompt (input), the better the output. To effectively formulate questions to obtain the desired response takes a bit of human training! To ensure clarity and brevity, it is important to provide context when necessary. For instance, when giving examples to the model during a task, it is more likely to perform the task correctly if it has the necessary context. It is crucial not to assume that the model has context, as it cannot read minds.
Additionally, it is helpful to consider the desired type of output in the prompt by providing example responses. If you want data provided in a table with headers, say that. If you want some scenarios to analyse, provide an example scenario. Divide intricate tasks into smaller components to prevent the model from ‘timing out’ or running down the wrong rabbit hole.
The process involves a significant amount of experimentation, so it is important to think creatively. Occasionally, it is necessary to repeatedly prompt in order to finely adjust it according to your requirements. Overall, your approach should be to define the objectives and limitations of the model for each task during the early evaluation stage.
AWS PartyRock - The Playground
PartyRock is an environment designed specifically to demonstrate the capabilities of foundation models from Bedrock. It provides the opportunity to explore and test various models and their parameters, gaining an understanding of their functionality, all without the requirement of an AWS account. Moreover, it is completely free of charge.
Acquaint yourself with the various models, their functionalities, and distinctions, and create or distribute applications using a user-friendly drag and drop interface. It’s an excellent first step if you’re brand new to Generative AI or you aren’t a coder. PartyRock allows users to either create their own content or browse apps created by others.
Building Generative AI Apps on PartyRock
If users choose to build their own content, they have the ability to specify what they want to generate or utilise a drag and drop builder. It’s quite powerful when you can type in English the type of problem or application you want to build and it will dynamically compile it. Let’s say I want to create an app that “can fetch a URL, extract its contents, condense the information into a summary, and generate an image based on the summary.”
We tried this prompt and it created a simple web app with a few boxes. In the top box, I could copy/paste a URL to a news article, for example, and click “Generate”. The app fetches the contents from the URL, uses the Foundation Model to generate a concise summary in bullet point form and then another model is used to generate an image based on that text summary. The article was about vaccines, the summary was reasonable and the image was pretty good. However, we were not happy with the bullet point summary format, so we ‘fine tuned’ the model with a new prompt and was able to customise that application component - I tried a different model and gave it instructions to write a short paragraph. We then repeated the exercise, regenerated the summary and image, and it was spot on.
Specifically in this generative AI app demo, we clicked on the edit button on the the Content Summary component and replaced the current model with Titan Express, which is known for its speed. The Content Summary needs to be significantly reduced from a long list of bullet points to a short paragraph by modifying the parameters. Additionally, we tried different Temperature settings and increased it from 0 to 0.5, with a focus on enhancing creativity. We also turned up Top Prediction to 0.6 to see if we could induce hallucinations. As we turned up Temperature and Top P, the narrative became more creative but it became imprecise and less factual. We settled on Temperature of 0.2 and Top P back to 0 which gave us a very factual and concise summary, not very creative, but perfect for our summary purpose.
Understanding The Limitations of Models
Fetching a URL and summarising it is pretty straightforward. We really wanted to test the limits of these Foundation Models so we looked for interesting computer science and mathematics problems that have been, to date, impossible to completely solve. We decided to play around with a concept called the Travelling Salesman Problem. It’s a traditional problem for which there is no perfect solution… yet. In fact, the Clay Mathematics Institute is offering a $1m cash prize to anybody that solves it (or proves that it cannot be solved).
The Travelling Salesman Problem involves optimising the route of a salesman who needs to visit multiple cities. To solve this NP-hard problem, a range of techniques can be applied. In short, you can “brute force” the problem by simulating all possible route combinations, you could try a “heuristic” approach to find a decent-ish route, you could try an algorithm for “nearest neighbour” or “brand and bound”. You can even try Simulated Annealing, which involves introducing randomness to the solution in order to explore different possibilities!
A Challenge, A Business Experiment
The Travelling Salesman Problem is an interesting experiment to try with Foundation Models because it can really push the capabilities to the limits, and help to show why we still need reasonable human judgement. Initially, with a low Temperature and Top P set to 0, we give it 3 cities into the City Input box: London, Manchester and Edinburgh. Below, are 3 additional components to the app in PartyRock called “Route Count”, “Computational Complexity” and “Route Visualisation”. To calculate the total number of routes is (n-1)!/2 if you don’t mind which city you start in. As you mess around with Temperature and Top P, the answers become increasingly bizarre like adding the cities together instead of a factorial. With each 20% incremental increase in Temperature and Top P, the more creative it gets but the more wacky the answer. It’s extremely fluent, convincing even, but completely inaccurate.
If you want to explore what others have created, navigate to the “Discover” tab and explore the applications and browse through the app catalogue. It’s fascinating to see all the generative AI apps people are making - from supply chain optimisation to storytelling!
Use Case 1 - The AWS Exam Coach
One of the most useful PartyRock apps we found was the AWS Certification Exam Practice Tool. As you study for an AWS exam or certification, you can ask it to generate exam questions based on a specific subject. The app presents sample questions, and when you provide your answer, it gives you feedback and a detailed explanation of the answer.
For example, if you were curious about what type of questions might be asked about deploying Lambda functions, you could create your own modified version of this app by clicking the “Remix” button. If you find an interesting app that somebody else has created in the PartyRock catalogue, you can “remix” it yourself and play around with it freely.
Use Case 2 - Instant Support Emails
Let’s say you have an application already and you want to call Bedrock using the API or trigger a Lambda function. In this scenario, we want to automatically compose an email to an angry customer in response to a complaint or service outage (totally fictional scenario).
Scenario: Let us introduce you to John. John is a disgruntled cloud architect at Acme Corporation who is complaining to BigCloud for a service outage. He sends his complaint email to Bob, the Customer Service Manager in the Complaints Department at BigCloud.
When the complaint email comes in, it triggers a lambda function on the server side. The function instructs Bedrock to generate an apology email from {CustomerServiceManager} to {CustomerName}, or in human readable speak, an email from Bob back to John. In the prompt, we specify that when responding to the feedback, Bedrock must avoid discussing specific actions taken or mention the company name, and instead, emphasise BigCloud’s commitment to improving their service. A typical customer service response!
So, there is a process that involves duplicating the payload, triggering the Lambda function, receiving a response back from Bedrock and… oh dear. The first generated email content was the most desperate, beggy email we’ve ever read. Fortunately, after some fine tuning (turning down the Temperature to reduce the drama and turning down the Top P to avoid a hallucination) and finally, we have a crisp, appropriately-worded response email.
Use Case 3 - Text-To-Audio Summaries
We know that transcription and speech-to-text has come a long way in the last 10 years. What surprised us was how good text-to-speech is getting with services like Amazon Polly. We always roll our sleeves up when presented with a challenge so this particular prototype was the most interesting and fun to build. It showcases the implementation of Bedrock using an event-driven architecture. It shows just how profoundly embedded Bedrock can be and how easy it is to integrate with other web services utilising Infrastructure-as-Code.
In this instance, our use case was simply based on an everyday problem: we are bombarded with a lot of articles, emails, news, bulletins. There is seldom enough time to get to Inbox Zero whilst keeping the plates spinning on the day job. That’s why we wanted to create a prototype that can summarise text as audio - you can listen to your inbox summary!
The user uploads a text file to an S3 bucket which contains, for example, an article or a long piece of text. A Lambda function is triggered to summarise the file with Bedrock and then save the summarised text file in another S3 bucket. After the file is finished processing, Amazon Polly is triggered to analyse the text and convert the summarised text to audio.
The resulting summary file is now narrated as a neat little MP3 file, and we have our long article summarised in 3 different formats. Event notifications can be set up in S3 to use Lambda functions as the Destination Type for triggers. From an uploaded text to a lovely Southern lilt narrating the article summary… Now imagine that for your inbox.
Going Deeper into AI on AWS
Generative AI, as we have previously noted, is one of many subdisciplines within artificial intelligence. If we broaden the pool of services available to us within AWS, there is an incredible suite of puzzle pieces just waiting to be slotted together to form a useful masterpiece. You can start with Foundation Models to develop your own applications, and move on to richer or deeper capabilities within your existing cloud environment, compliant with your own internal security policies and use policies to onboard users faster.
Want to train an ensembling model to predict customer churn and visualise that as a neat little web dashboard? Try Sagemaker Canvas. Want to pull car part serial numbers from a bunch of crumpled invoices that a customer uploads? Try Textract or Rekognition. Want to create a multilingual customer service agent tapped into a million pages of technical documentation? Try Lex, Q and Translate. How about having a chat with a PDF and interrogating a long document? You can make it a brilliant user experience by implementing Retrieval Augmented Generation techniques and connecting it to your knowledge base.
What Did We Learn (Mauricio Foaro)
"Prior to undertaking preparations for our first Live Lab on GenAI, my knowledge on the subject matter was limited. I engaged in casual conversations and conducted preliminary research, but did not delve into the subject matter extensively. Now, I have pretty useful knowledge regarding Bedrock, including its parameters for models, the process of deploying a use case, its functioning, utilising RAG to enhance the output, creating applications that provide contextually relevant and accurate information, and reducing hallucinations."
Where Should You Start (Mauricio Foaro)
"The content provided by AWS Skill Builder is high quality. After approximately 20 hours of study time (with 4 hours specifically focused on Skills Builder) I have achieved an intermediate level of proficiency. As a result, I am now capable of constructing a practical use case on the Amazon Web Services (AWS) platform for everyday applications. Engaging in prompt engineering involves navigating a learning curve, which includes tasks such as setting parameters and testing a foundation model. However, this process is an integral part of the learning experiment. For under $1 of AWS usage, I could demonstrate the process of chatting with a PDF, summarise your entire inbox as audio or auto generate emails."
How Can Cloudsoft Help You Practically?
The biggest problem we’re seeing at the moment is finding the right use cases. Even if something makes sense as a Proof of Concept, it may not be designed to scale to enterprise usage levels or even make sense in its current configuration. As an AWS Advanced Consulting Partner with 15+ years working on problems for a range of customers from tier 1 banks in the US to fast-growth startups in the UK, we have access to POC credits, a team of specialist AWS developers, architects and security experts who care about the outcomes. The best starting point is often a workshop, engineer-to-engineer, and a qualification of the problem statements, commercial use case discovery and a clear implementation roadmap.
About The Authors
Frank Khan Sullivan is the CMO of Cloudsoft. He is an AWS Cloud Practitioner with 15 years’ experience, most recently building a data practice and leading GTM strategy.
Mauricio Foaro is an Engineer in Cloudsoft, having started his career as a software engineer and now works deeply within the AWS ecosystem supporting clients with their use cases.