Building a Personal Energy Usage Report

Can AI help build a tool to understand residential energy usage?

Dec 12, 2024

This post is something of an interlude from the series to date. It details an attempt to build a personalised domestic report of my energy data over this year using AI tooling to generate useful insights on my usage as well as recommendations on energy efficiency. The reason for doing is to explore a use case for AI that provides real world utility to ordinary citizens given the context of the wider energy crisis in the UK right now. A reminder of how real that is from outside my local supermarket earlier today. Energy is really expensive in the UK right now:

In this post I will explain how I managed to generate the report shown below on my personal energy usage without writing any code at all myself. Instead, I used a couple of prominent AI tools to generate over 500 lines of working Python through judicious prompting. I will outline the process I used and analyse the quality of results generated. The post will touch upon various elements of software development so it will be more technical than the previous ones in the ClimateCTO series.

Obtaining the data

My home energy provider for both gas and electricity is EDF Energy. They provide both web and mobile interfaces for customer access to their usage data via their Energy Hub.

It is possible to download your monthly dual fuel energy data detailing kWh used and the cost of that energy. Here is a screenshot of my Energy Hub via the web app showing at the bottom the option to download data as csv. To follow along with the analysis below you need to select Energy, Both and Month before clicking to download:

I explored the structure of the corresponding csv file in the recent past in a standalone notebook here. It’s fairly straightforward. Each row holds monthly totals and there are five columns in this order: timestamp (as mm/yyyy), electricity consumption (in kWh), electricity cost (in £), gas consumption (in kWh) and gas cost (in £). The data dates back to the beginning of the calendar year. From a brief look around, EDF don’t appear to provide any programmatic access via an API to access this data. Therefore, to get started, you will need to log into your consumer portal, navigate to the Energy Hub and download your csv data. Note, it may be possible to automate this step through RPA (robotic process automation) in the future. You should still be able to adapt the recipe detailed in this post to work with your energy supplier if they are not EDF as long as they can provide information in a similar monthly total format, which ought to be the case. Octopus Energy in particular provides good API coverage so you should be able to build this report programmatically if you are on one of their tariffs.

Developing the advisor

After watching a recent OpenAI demo outlining how they frequently start product analysis with an upload of a csv data file to ChatGPT, I thought I’d try the same approach. Before doing this, you may want to address a common concern and ensure that your data is not being used by OpenAI for training. To disable that scenario, make sure that the option in Settings under Data Controls to “Improve the model for everyone” is switched off. Now whatever you upload will not be absobed into their training data set:

As I have a ChatGPT Pro account, I also have Canvas support. I initially believed that would be necessary to develop a script to process the data. It turned out that vanilla csv data analysis has been supported for over a year via Code Interpreter. ChatGPT is able to parse the csv file and use a combination of popular Python libraries pandas and matplotlib to create dataframes and graphs respectively. The basic approach is described by OpenAI here. The prompt I used to get started with an initial cut of a command line interface (CLI) Python tool I used to generate an HTML report from the uploaded csv is shown next along with the beginning of the response:

I encountered errors trying to execute the code because it was based on an older version of the openai library. I could have fixed them all by hand but I had set myself a constraint to only use AI to generate my code even if it was more clunky to do so. After a bit of error wrangling I managed to fix the version issue as well as some formatting problems. I had a basic script that was working generating and rendering a set of insights and recommendations as well as a graph of energy usage. At this point it became difficult to proceed with ChatGPT so I switched to my second tool, Cursor, an AI Code Editor based on VS Code. Cursor offers a chat interface to talk directly with your code:

Cursor Chat lets you ask questions or solve problems in your codebase with the most capable language models, all in your editor.

Here’s a representative screenshot of what it looks like. The script source is in the middle pane and the chat interface on the right. In this example I have asked Cursor to switch to using an environment variable for the OpenAI API key that will be used by the CLI tool which I eventually named energy-advisor.py. Code is generated to address the prompt in the chat window along with an explanation of what it does. You can then choose to apply it to the main source window where you are presented with a colour-coded source diff that you can click to accept or not. This example shows the interaction after a prompt to change the code to read the OpenAI API key from an environment variable:

Once you get used to it, Cursor becomes an incredibly productive way to develop code and offers a lot more fine control over something roughly generated by ChatGPT. If you’re a developer, it’s worth trying it out to see what it can do. I found that forcing myself to use it to modify the code without intervention helped me to understand how to use it better. After doing this for a while, you achieve a strange state where you feel like you are learning how to wrangle an intelligent machine. One that counterintuitively perhaps can be bargained and reasoned with and seems to convey a sense of empathy but absolutely will not stop ever until you have finished asking it questions. By using small, focussed requests, you can generate correspondingly tight code diffs which show the changes to be applied to the old code to make it satisfy the request. Invariably these applied diffs, even reasonably complex ones, work which is unsettling for someone who has spent decades assuming this required human knowledge to achieve. I have found myself gradually trusting Cursor more often than not. The only significant issue I have encountered is that it often seems to delete previously generated valid comment text in its generated diffs which includes prompt text. It’s therefore important to review each of the changes it suggests when you apply its code suggestions. What this means is that as is often the way with no-code tools, it turns out you really do need to have quite a bit of experience of coding to be able to make the most of Cursor. To provide a sense of what it’s like, here is an example of the prompt I used to successfully add ollama support to the tool. The code diffs are marked in green for addition and maroon for deletion:

The final script I generated after a long session with Cursor is available for inspection in Github here. It had a lot more added to it. It’s worth emphasising once more that none of this was done by me - it was all done through prompting Cursor. Some of the highlights from the top down:

Suitable licence text added at the comment block at the head of the file
Security considerations list added following a code security review
Dependencies list added
Version support added
Todo section added
Python docstrings added to each function for documentation
HTML template including CSS refined after multiple prompts to give precise output required
Sanitisation of HTML content added following security review
Get API key function added.
Introduction of docopt to handle command line input
Addition of three flags: -V version, -v verbose and -h help
Addition of logger support enabled under -v verbose flag

The only really significant changes made by me were to the text of the prompts used to generate the insights and recommendations sections in a structured way. They are both similar in construction and evolved through trial and error. They started out somewhat hacky, built to return results in a way that allows the HTML jinja template code to loop through the results to generate the formatted list. The latest version looks to structure the results as JSON in a list of dictionaries of heading followed by text block. The results are not deterministic between runs of course so I expect these prompts will be brittle in operation as I’ve only had the opportunity to test them a few times. The general rule of prompts such as these is that the more detail you provide about how you want the results to appear, the better the results. I am not an expert at prompting therefore it’s very possible the two key prompts for generating insights and recommendations could be further improved. Here is the recommendations prompt at time of writing:

prompt = """
Based on the following dataset trends, provide actionable recommendations to lower electricity and gas usage.
Return the response as a JSON list of dictionaries.
Each dictionary should have two keys: "heading" and "recommendation".
The heading should be a single sentence that summarizes the recommendation.
The recommendation should provide a paragraph of detail and actionable advice.
A specific call to action should be included at the end of each recommendation.
No markdown or HTML formatting should be used.
Ensure that every number is supplied with the right unit.
Costs should have a £ sign preceding the number.
kWh should have a kWh suffix.
Numbers should be formatted as a number with 2 decimal places.
Generate at least 5 recommendations.
        
Example format:
[
{{"heading": "Install LED Lighting", "recommendation": "Replace all traditional bulbs with LED alternatives to save 100.00 kWh annually"}},
{{"heading": "Another Recommendation", "recommendation": "Another detailed recommendation"}}
]
"""

Results

Now I have walked through how the code was initially created and evolved, let’s look at the output and go through them section by section starting with Key Insights. They’re quite impressive all things considered. The advisor correctly picks up that gas usage is significantly higher and more expensive than electricity usage. Further, that there is a wide variation in electricity usage - an EV charge point was used until August. Gas usage is high in the winter and low in the summer which is suggestive of a gas boiler for heating. This seasonal trend wasn’t picked up by the advisor but could be provided as input into an adjusted prompt in future. The insights from the last run shown below along with the generated energy consumption time series line graph. Note electricity figures over the first few months of the year are inflated because I was charging an EV at home:

The recommendations are also quite impressive with a focus on energy efficiency, demand switching to reduce load at peak and installation of renewable energy sources. These are all important mitigations in relation to residential energy use. In this case the response does pick up on suspected heating inefficiencies due to the gas bill. There wasn’t any mention of exploring heat pump installation in this case but it did come up in other runs of the script. As with the generated insights, the results are not deterministic and there is considerable variation on repeat runs. The prompt could do with more experimentation to ensure the recommendations are user friendly and complete. It may be that a better approach is to conduct several runs, create a store of responses and then merge the results to achieve an amalgamated report.

As a quick test I changed the prompt to generate “at least 10 recommendations” and the results look more complete covering a range of aspects important for mitigation including switching to LED lighting and improving insulation:

Next Steps

There are several significant improvements that could be made. First of all, no test code has yet been developed since this is just a sketchy proof of concept experiment. Secondly, there is limited LLM support offered within climate-advisor tool right now in the form of gpt-4, gpt-3.5-turbo and ollama which was used in the climate-oracle introduced last time around. Using ollama reduces LLM inference costs to nothing and entails considerably less carbon impact putting to one side for now training and emission costs. From initial inspection, neither gpt-3.5-turbo or ollama results are as good as gpt-4. Thirdly, there is no option yet to provide input context to the energy advisor about the customer. This context would help it make far more sense of the supplied data. For instance, the size and type of home, how many people live there, whether there are solar panels or a heat pump installed, how frequently an EV is charged if there is a charge point installed. Over time a set of personas could be developed and benchmarking provided. The recommendations could be tailored to a persona. Fourthly, we could compare the data one can obtain from EDF with the smart meter data accessible via the DCC per the recipe outlined in this accompanying notebook developed earlier this year. That recipe uses the n3rgy API to retrieve smart meter data given: a) 21-digit Meter Point Administration Number (MPAN) which you get off your bill, b) 16-character Smart Meter In-Home Display (IHD) number. Finally, it would be good to turn this into an online resource anyone could use which would require some thinking through deployment and paying for the LLM costs.

Building this energy advisor has been an interesting process and quite alien from human-oriented software development to date. ChatGPT was great at generating the initial sketch of the code before handing the baton over to Cursor to assume control. I spent over 90% of the 3 hours or so I spent overall on developing energy advisor in Cursor. After a while it really did assume the form of an inexhaustible code agent always ready to take on my prompts and nudges to fix and improve the code. I think it helped to force myself not to make any manual changes to the code, trust the tool and focus my human effort on the two central prompts. I was genuinely surprised how often the generated code was right on application. One has to be a little careful and check the output before accepting the suggestions because these tools often don’t always get it right and doing so is very dependent on getting the prompt right. Simon Willison has written about this in the past suggesting that LLMs operate like a weirdly inexhaustible intern. One that takes a while to get used to:

A friend of mine says that 10 hours is the minimum you need to spend with a GPT-4 model before it really starts to click - before you understand what these things are and how to use them. And I think that's what it takes to develop the level of expertise where I can look at a prompt and, 90 percent of the time, correctly predict whether it will work or not.

Overall, after many years of coding, a lot of it spent on routine implementation and boilerplate, it does now feel like something has now irrevocably changed for the process of software development. Tools like ChatGPT Pro and Cursor are now so powerful and productive that it feels remiss not to use them. And after doing so, nothing will be the same again. Other commentators have written eloquently about the pros and cons of that and the implications for human agency. Steve Yegge refers to this style of development as Chat-Oriented Programming, or Chop for short:

Chop isn’t just the future, it’s the present. And if you’re not using it, you’re starting to fall behind the ones who are.

My sense of the current context is broadly positive. The rationale being that the modern development should be more focussed on design and refactoring rather than implementation. Which leads to the most important question of all. Were the results useful at all? The early results I generated were promising and surprised me in terms of utility but the output is brittle and non-deterministic. However, it’s hard to be conclusive without a lot more testing. In particular, little has been done to identify, let alone address, hallucinations in the results. Tools like giskard can be used to build verification tests for LLMs and should be applicable here too. Improvements could also be made to the structure and format of the insights and recommendations prompts in addition to the other next steps outlined earlier.

Building energy-advisor was an attempt to use AI technology for good, to help address the climate crisis through insights and advice generated from raw residential energy usage data. Next time around, normal service is resumed and we will look into Adaptation options especially Solar Radiation Management (SRM).