Why Can’t Chat GPT Design a Floorplan?
This was originally produced as a podcast, you can listen below or here. But for those of you who prefer to read, please enjoy the below.
What’s the difference between the AI that’s been around for the last decade or so, powering things like the facial recognition in our phones and our news feeds and the AI that everyone is talking about now? And importantly which is more relevant to our real estate world?
We’ve spoken before about AI vs traditional rules based computing and what that means for real estate.
TRADITIONAL AI VS LLMS
The very short summary is that traditional computing needs to be told exactly what to do, it’s rules based which means a human has to programme every possible scenario, if I press this button, do that, if that happens, do the other. With machine learning and particularly deep learning the computers start to work things out for themselves, not everything needs to be programmed explicitly by a human.
But this AI has been around for a while, shaping the way we read the news, navigate the world through google maps, what’s the difference between that AI and the LLMs and importantly which is more relevant for real estate?
Overall LLMs are simply a subset, a result of all the work that’s been happening in AI over the last decade or so.
But in an interesting way, some of the advances through LLMs will actually make it easier to make other advances in AI over the coming years.
I was prompted to think about it this week having spoken about the floorplan generation concept last week. There are a number of interesting companies working on tools to generate accurate, feasible floorplans close to instantly. And so that raises the question, why can’t Chat GPT just do that?
We can generate full images of cities, lifelike photos of humans that don’t actually exist, deep fakes of a politician saying something, we can generate all this lifelike, imagery, why can’t we just say to chat GPT or Midjourney please take this floorplan and put in an extra bathroom or convert from office to residential and so on.
WHY CAN’T WE ASK CHAT GPT “PLEASE TAKE THIS FLOORPLAN AND PUT IN AN EXTRA BATHROOM”?
Firstly LLMs, as we know, are primarily trained on textual data, vast amounts of written text. They are replicating the billions of examples of written language they’ve already seen. This is why they’re so good at generating content in natural language form.
And when the models generate images they can do that because they understand the context around words because so for example if you just ask for an image of the London skyline at sunset it doesn’t necessarily have to have seen an image of the London skyline at sunset, but it knows the general architectural style of London, it knows what a skyline looks like and it knows the kind of colours that make up a sunset and so the image that is produced is built upon this. From your simple prompt “an image of London skyline at sunset” it builds a much more detailed word-based prompt.
It might be something like:
Create a vibrant and detailed image of the London skyline at sunset, showcasing iconic landmarks such as the Shard, the London Eye, and Big Ben. The sky is ablaze with hues of orange, pink, and purple, reflecting off the River Thames. The silhouettes of the buildings are detailed and recognizable, set against the dramatic backdrop of the setting sun. This picturesque scene captures the essence of London in the golden hour, highlighting the city's unique blend of historical and modern architecture.
(actual wording Chat GPT uses to create images after given a simple prompt like “London skyline at sunset”)
Then the image is generated on the typical patterns and examples that each of these phrases bring up in images. But importantly a skyline image looks more like an artistic representation rather than one where the buildings are accurately drawn to scale and in exactly the right position and orientation
Models like DALL-E are trained on vast datasets of images that include a wide range of subjects, from landscapes and cityscapes to meaning they generate visually appealing and conceptually accurate images based on descriptive text. Generating artistic or conceptual images like landscapes or cityscapes relies more on capturing the essence or mood of the subject rather than precise technical details.
WHY CAN’T CHAT GPT GENERATE AN ACCURATE FLOORPLAN?
Creating an accurate floorplan requires strict adherence to technical specifications, dimensions, and spatial relationships. These requirements are very different to the capabilities developed through the training of general-purpose image generators.
Image generators can approximate the layout of rooms or buildings based on descriptive text but they are not specifically trained to create technical drawings or architectural plans that require exact measurements and a deep understanding of building codes or design principles.
This is kind of a sad thought experiment but imagine a person who has lived in single room never left that room and who only learned about the world through what they could read on the internet and in books. What would happen if you asked them to design a floorplan? That’s effectively the scenario when we’re asking Chat GPT or large language models to do space planning. They know broadly what a floorplan looks like but if you’ve never walked through a physical space, moved around a building, it’s hard to understand why certain things are in certain places, why you might want a window in every room and why it’s not appropriate to have a loo in your dining room, even if it might be practical.
SO as it stands, we need the companies who are working on building generation, space plan generation, they need to train their algorithms the parameters of different layouts and then also the regulations, that might be things like having windows in every bedroom, each bedroom being greater than the minimum space standards in the UK, or the space required if you need wheel chair turning circle things like this, humans will train the algorithms on these parameters, and then the algorithms can go away and cycle through the hundreds or thousands of potential iterations where all those parameters and regulations are met.
CURRENT USES FOR AI IN REAL ESTATE
In my opinion the work that is really moving the real estate world forward is actually (at the moment) more like the AI powering facial recognition in smartphones or curating personalized content in news feeds. These systems are examples of a relatively narrow AI, optimized for a particular use case with defined behaviours and patterns to follow. Take the news feed example, news feed algorithms learn from user interaction (likes, shares, comments) to tailor content. Those algorithms are pretty impressive and obviously have shaped our world more than we can say, but it’s a pretty specific use case.
In contrast the power of LLMs is that because they have such a deep ability to process natural language they can be used in a myriad of different ways from writing content to writing coding, answering questions, and even creating art or music and they can do that for any different subject or genre. IT’s much broader, more general purpose and0 it’s why, in my opinion it’s captured the public attention. Facial recognition is useful but it just serves one function whereas we can see from just one conversation with an LLM the infinite ways we can interact with it.
SO you can see, as it stands, things like floorplan generation requires much more precise human input, technical input, which is very different to an LLM which effectively reads lots of text and can work out statistically, what sentence structure sounds like natural spoken language.
POTENTIAL FUTURE SCENARIOS
Could we get to the equivalent of an LLM, which has been trained on loads of BIM models, loads of detailed floorplans and can automatically produce a building design? Without being directly trained on the local regs, space standards? I think we could probably get some way, if you look at organisations like some of the big architects practices have huge amounts of data to train models on, to ask it to find and replicate patterns of architecturally and structurally sound buildings but for now the more narrow, more precisely trained approach is coming up with good results.
BUT SHOULD WE BE REPLICATING HE DESIGNS WE’VE BEEN BUILDING TO IN THE LAST 20 YEARS?
Another very skeptical point, risk exposing myself to the wrath of my fellow developer friends which is that simply put, I’m not sure we should be replicating the designs and layouts of some of the buildings that have been built over the last 20/30 years. If the buildings that we have the most technically detailed plans for are the ones which have been built relatively recently then if we only train models on those designs, we will continue to replicate them. For me I’m more interested in the buildings that have been around for centuries, the Victorian terraces that have been able to adapt to changing family needs across the decades, or the warehouse office conversions that people love. The proportions of a mansion block or the ceiling heights of a Georgian terraced house. I’m more interested in understanding the spatial patterns here and replicating those.
So in short LLMs are a specific type of AI, it’s very text based, very image based, there are not designed to ingest 3D, spatial, technical data, where accurate dimensions are vital.
HOW CAN LLMS HELP?
Obviously image generation, deeply rooted in the advances in LLMs, becoming really helpful in visual design, in rendering, more quickly. So we still need humans / architects to get us to the structural shape and then we can run through many more design led iterations. An architect can control certain sections of a design and get the image generator just to work on those sections, enables much quicker iteration of designs.
And then there’s some other less obvious examples
INCREASED INVESTMENT IN AI For example, by opening up the public’s eyes to LLMs, the interest in the potential of AI has grown massively both for investors and users so I’m sure that there are more companies buying AI products which might not use LLMs but the companies simply might not have thought they needed the product a year ago and now suddenly they’re exploring them.
EASE OF ACCESS TO CODE Another way that LLMs are helping is that software developers ( and non-software developers like me!) can use them to create code to build or improve software. It’s becoming much easier to create a minimum viable product and also some software developers are using LLMs to make more efficient code.
IMPROVED COMPUTING EFFICIENCY Finally computing power, whilst there has been a chip crunch as demand for processing power has gone through the roof this has also led to both an increase in supply and also of course AI-powered explorations for things like semi-conductors which should increase the processing power availability so the companies that need to cycle through hundreds or thousands of iterations of floorpalns should find the computing power to do that, cheaper and more accessible!
So there we have it, LLMs are not actually directly driving all of the uses of AI in real estate but they are improving the public understanding, the customer and investor interest and also some of the infrastructure needed to lead to adoption of good AI tools in real estate.