GPT-5 and the new Crawler Gptbot web developed by Openai

I don't think it will be long before OpenAI comes into play for the development of an AI-based search engine as well. The new web crawler GPTBot with GPT-5 broad language model is already released.

Those using ChatGPT know that this broad language model (LLM) is currently running GPT-3.5, being trained on a dataset updated in September 2021. So if newer information is requested than this date, ChatGPT is not able to provide accurate information. Of course, valid for the free version that does not support the use of auxiliary plugins.

With the launch GPTBot, OpenAI has the way open for web page indexing through this new web crawler. As companies such as Google, Microsoft, Yahoo and many others have been doing for many years.

GPT-5 and the new Crawler Gptbot web developed by Openai

The new web crawler GPTBot uses web agent:

User agent token: GPTBot
Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

Website owners can control the indexing of web pages through the file robots.txt, using the same directives as for other web crawlers from other companies.

For example, if the owner of a website does not want OpenAI to collect information from the website, they can add in robots.txt the lines:

User-agent: GPTBot
Disallow: /

Even though it behaves like a web crawler, GPTBot will have a distinct purpose: to harvest publicly available data, while carefully avoiding sources that involve paywalls, collection of personal data, or content that violates OpenAI policies.

But there are quite a few controversies, some that have even attracted legal action against OpenAI over privacy and using content without the consent of the authors or without identifying the sources.

In June, Japan's privacy regulator issued a warning to OpenAI about unauthorized data collection. Also earlier this year, Italy temporarily banned the use of ChatGPT due to alleged violations of European Union privacy laws.

Passionate about technology, I write with pleasure on stealthsetts.com starting with 2006. I have a rich experience in operating systems: Macos, Windows and Linux, but also in programming languages ​​and blogging platforms (WordPress) and for online stores (WooCommerce, Magento, Presashop).

Home Your source of IT tutorials, useful tips and news. GPT-5 and the new Crawler Gptbot web developed by Openai
Leave a Comment