All eyes were on Google this Wednesday as they held their annual developer conference I/O in Mountain View, California. Highlights include:
- Bard (Google's ChatGPT competitor) is now public: https://bard.google.com/. On the roadmap is adding support for other languages and other forms of input like images and audio.
- PaLM 2, a new language model that powers Bard.
- MusicLM, a text-to-music AI tool where you can provide a prompt like "soulful jazz for a dinner party" and it will generate songs for you. You can listen to some examples on their research page.
- Universal Translator, where you can dub videos in a different language using AI. The translated version is made to sound like the original voice and the video is edited so the lips match the translated version (!!)
Google are also working on adding AI to their existing products: you'll be able to generate AI responses to search results, convert text messages you're writing into different writing styles like "Excited", "Chill", or "Shakespeare", and use AI to extract/remove elements from photos. For example, you can reposition or remove items, and improve the colour of the sky without affecting the rest of the photo:
Google Maps is getting Immersive View, where you can view digital models of cities including traffic simulations, bike lanes, complex intersections, and parking based on billions of Street View and aerial images.
GitHub Copilot finally has a competitor, Duet AI for Google Cloud. What I think is most exciting about this is having a new UI for cloud platforms. Currently you have to wade through UIs like AWS Console and massive amounts of documentation to use these platforms: AI text prompts are an interesting solution to simplify this, where you can write "Help me deploy my app on cloud run" then see relevant documentation and CLI commands surfaced by the AI without leaving your IDE.
Is the future screenless?
In a previous post, I linked to a few clips of a pocket-sized wearable that was recently demoed in a TED talk by Humane, a secretive startup founded by ex-Apple employees Imran Chaudhri and Bethany Bongiorno who worked on the development of the iPad. The full talk is now available on YouTube, and it's worth a watch! While big tech companies like Meta and Apple are betting big on AR/VR devices that you strap onto your head, physically separating you from the world around you, Humane is taking a different approach of trying to make technology invisible and integrated seamlessly into our lives. Its wearable device can translate live from your pocket in an AI-generated voice that sounds like you (they have partnered with OpenAI), project a user interface onto your hand for receiving a call, and take videos from your pocket so you can be more present in a moment while also recording it. It reminds me of Golden Krishna's book, The Best Interface is No Interface. I'm really excited to see what they develop with this philosophy.
An AI that can read your mind?
Researchers have developed a model that they claim excels in behavioural and neural data, and "can decode activity from the visual cortex of the mouse brain to reconstruct a viewed video". They've published a paper about it in Nature. This is definitely getting into freaky territory!
GPT while controlling your data
One of the major drawbacks with tools such as OpenAI is privacy and data security. Entering work information into ChatGPT leaks IP to OpenAI, who could use that data for training or any way they like. Companies such as JPMorgan and Amazon have banned employees from using ChatGPT, and the list is growing.
privateGPT is a new project that addresses this concern by proving a GPT model that you can run locally without an internet connection. You provide it
.txt files with your dataset to ingest and then ask it questions via the command line. It takes around 20-30 seconds for a response, which is slower than ChatGPT and Bard, but without the data security concerns and free!
Also at Google, a leaked internal document from an AI researcher suggests that neither Google nor OpenAI have a moat, and open-source AI is quickly catching up. Meta’s LLaMA model that was leaked at the beginning of March has helped the open-source community improve quickly, in a matter of days in some cases, to the point where "the barrier to entry for training and experimentation has dropped from the total output of a major research organization to one person, an evening, and a beefy laptop". I'd encourage you to read the whole article: it's an interesting take on the 2023 AI arms race.
Meanwhile, over at Meta, they've released ImageBind, where you can generate AI output from a range of different media (multimodal AI). For example, you can provide an audio clip of some running water and an image of some apples, and the AI will return an image of a bowl of apples under a running tap:
What I've been reading
Over the last few days, I've been really enjoying reading content from David Heinemeier Hansson, the creator of Ruby on Rails and co-founder of Basecamp, an online project management tool. If that wasn't enough, he's also won a class in Le Mans, the 24h endurance car race in France, and written the business book REWORK with Jason Fried.
The cool thing about Basecamp is that they resisted the standard Silicon Valley approach of raising as much money as possible, and instead focused on creating a sustainable, profitable business. (I know, how radical! 😂) It's paid off: he commissioned a Pagani Zonda HH and apparently "bought a vacation home in Italy just so he could drive the thing". Jeff Bezos is also an investor.
One of my favourite posts is his take on how the startup scene has gotten crazy with people chasing unicorn status rather than building things that actually add value to society:
Part of the problem seems to be that nobody these days is content to merely put their dent in the universe. No, they have to fucking own the universe. It’s not enough to be in the market, they have to dominate it. It’s not enough to serve customers, they have to capture them.
He's not afraid to speak his mind, which makes his content all the more fun to read! You can read his tech takes on his blog and find out more about him on his personal website. Also, while it takes a while to get through, his 3.5h podcast episode with Tim Ferris is a great listen!
Well that's a wrap, again! Have a great weekend!