Will the Internet Always Be Human?

In a digital era where the line between human and automated interaction blurs, the latest findings from this year’s “Bad Bot Report” by Imperva, an American cybersecurity company, reveal a startling trend: nearly half of all internet traffic in the last year was generated by bots, marking a record high since 2013. This surge in automated activity, especially pronounced in Ireland, where 71% of internet traffic is bot-generated, raises profound questions about the future of the internet and its human essence. Generative AI is making matters worse.

The report suggests that the escalation in bot traffic is largely driven by advancements in generative AI and large language models (LLMs), as web scraping, which involves bots extracting data from websites, is not a novel concept. Yet the emergence of AI has thrust this practice back into the limelight. Sophisticated AI technology depends extensively on large datasets for training, often sourced through web scraping. Although this method can propel AI advancements, it also triggers significant legal and ethical issues. Nanhi Singh from Imperva warns that “automated bots will soon surpass the proportion of internet traffic coming from humans, changing the way that organisations approach building and protecting their websites and applications.”

The legality of web scraping varies by jurisdiction and specific cases, but the rise of AI adds complexity to this issue. While proponents claim that data collection is essential for AI progress, critics argue that it violates copyright laws and privacy rights. For example, the EU AI Act introduces data scraping and intellectual property measures in the EU, augmenting the previous stipulations. The EU AI Act prohibits AI systems from creating facial recognition databases via indiscriminate scraping of facial images from the internet or CCTV. It also mandates that general-purpose AI systems adhere to EU copyright laws and transparency obligations, including disclosing information about their training data.

The implications of this surge in bot activity are already manifesting on major online platforms like X (formerly known as Twitter), where user interactions are heavily inundated with bot-generated content, such as spam advertising pornography. This has compelled Elon Musk, the platform’s owner, to introduce user charges for posting and interactions in an attempt to control the proliferation of automated accounts. This issue extends beyond X, affecting other major social media platforms like Facebook and TikTok, highlighting a widespread challenge in maintaining the authenticity of online interactions.

Meanwhile, the banking sector also faces its own set of challenges linked to this rapid technological advancement. A significant number of banks admit to struggling with customer identity verification amidst an environment rich in sophisticated frauds and scams. The rise in AI-generated fraud has become a primary concern, with 37% of banking risk leaders highlighting it as a pressing issue, signalling an urgent need for robust responses to these evolving threats.

In the tech industry, Apple is making significant strides with AI, as evidenced by their Project ACDC (Apple Chips in Data Centre), which focuses on developing AI-specific chips for data centres. This initiative focuses on AI inference tasks – using trained models to make predictions on new data – rather than on training the models themselves, which remains a domain dominated by the chipmaker Nvidia. Apple’s strategic pivot towards inference chips aligns with broader industry trends, where companies like Google are also investing in self-reliant AI technologies to diminish dependence on external chip manufacturers.

Amidst these technological shifts, OpenAI has recently released a statement titled “Our approach to content and data in the age of AI”, which purports to realign their AI model training practices with the preferences of creators and copyright owners. This initiative is presented as a significant policy shift; however, it has been met with scepticism regarding its authenticity and potential effectiveness. Critics argue that OpenAI’s narrative in the document glosses over existing discontent within the creative community and does little to address the fundamental issues of copyright infringement and fair compensation. The proposed “Media Manager” tool, intended to allow creators to specify how their content is used in AI training, is seen as a superficial solution that fails to address the need for preemptive consent and adequate remuneration for creative works.

This technological evolution and the debate over digital copyrights force us to reconsider the foundational structures of our digital interactions. As advanced AI models demand increasingly vast compute resources for deployment, the transparency of frontier labs in their data accumulation practices comes under scrutiny.

The evolution of internet traffic plays a critical role in shaping the future of digital interaction. As bots become an omnipresent feature of online activity and as companies and governments navigate the complex implications of advanced AI, the question remains: will the internet retain its human touch, or will we move towards an increasingly AI-propelled digital ecosystem? And, ultimately, is this the future we want?

If you want to delve deeper into the challenges of AI governance, the regulation of synthetic media, and the global security implications of AI advancements, join us on Discord at https://discord.gg/2fR2eZAQ4a. Here, we can collaborate, share insights, and contribute to shaping the future of AI in a manner that safeguards our security and democratic values and fosters responsible innovation.