Google DeepMind, in collaboration with Hugging Face, has recently made their research on ‘Scalable watermarking for identifying large language model outputs’ open source. This research aims to distinguish between human and AI-generated content on the internet, driven by large language models (LLMs). The research paper, launched a year ago, introduces SynthID, a watermarking tool that is now available for wider access. This move by Google is an effort to promote transparency in the AI content space and address the question of “What’s real anymore?”.
SynthID is capable of producing a digital watermark that is imperceptible to humans and can be used across Google products to tag AI-generated images, videos, audio, and text. The tool has already been deployed in Google’s Gemini and Gemini Advanced chatbots, serving millions of users. It is also available on Google Cloud’s Vertex AI and supports Google’s Imagen and Veo models. Users can now check if an image is AI-generated using Google’s ‘About this image’ feature in Search or Chrome.
One of the key features of SynthID is its negligible computational impact, making it suitable for both cloud and on-device detection. The research paper explains that SynthID uses a tournament sampling process to select the highest-scoring token, which is then used to check for the presence of a watermark. The tool offers two types of watermarking – non-distortionary and distortionary – to balance between text quality and detectability. It is also efficient and scalable, making it suitable for use in large systems.
However, there are still some limitations to this watermarking method. Ethan Mollick, a professor and co-director of the Generative AI Lab at Wharton, pointed out that this technique requires the cooperation of LLM providers to be effective. It also becomes less effective when the text is edited or paraphrased. To address these issues, associate professor at NUS, Bryan Kian, and his team have developed a watermarking tool called Waterfall framework, which can detect plagiarism and unauthorized LLM training.
In conclusion, the release of SynthID by Google DeepMind and Hugging Face is a significant step towards promoting transparency in the AI content space. However, there are still some challenges that need to be addressed to make this watermarking method more effective. With the continuous development of AI technology, it is crucial to have measures in place to distinguish between human and AI-generated content.