May 21, 2024

OpenAI researchers proposed a new AI security strategy-iterative amplification method

Recently, OpenAI researchers have proposed a new AI security strategy-iterated amplification, which is achieved by describing how to decompose a complex task into simple subtasks instead of providing label data or reward functions. A description of complex behaviors and goals. Although this method is still in its infancy, researchers believe that this method will provide a large-scale implementation of AI security.

If we want to train a machine learning model to complete a specific task, we must train signals to evaluate the performance of the model and help the model continue to learn and improve. For example, the label in supervised learning and the reward function in reinforcement learning are training signals. An important assumption in the machine learning system is that these signals already exist and algorithms can learn from them. But the reality is that the training signal may come from somewhere unknown. If we don't have a training signal, it means we can't learn. If we get the wrong signal, then the algorithm may get unconscious or even dangerous results. Therefore, for new tasks and AI safety, improving the ability to obtain training signals is very necessary and extremely valuable.

So let us see how the training signal is currently obtained? Sometimes we can use algorithms to get the signal, for example, in the game of Go, we can get the signal by counting the score. However, most real-world tasks do not have a signal expressed in mathematical form, but usually we can obtain training signals manually. But the reality is that many complex tasks have far exceeded human cognition, and we have no way to judge whether the output of the model is correct, such as designing a complex transportation system or a management system that manages the security details of a huge computer network. , Or the complex task of predicting long-term global climate trends.

For problems that require different training signals, the training sequence number can come from expression evaluation and human feedback, but some tasks are beyond the capabilities of humans.

The iterative amplification proposed in this paper is a method of generating training hypotheses for subsequent tasks under certain assumptions. In fact, although humans cannot directly grasp complex problems in the overall picture, we can assume that humans can effectively evaluate whether a small task in a complex task meets the requirements. For example, in the example of computer network security, people can decompose "defense against a series of attacks against servers and routers" into "attacks against servers" and "attacks against routing" and "the possible correlation between the two attacks" . In addition, we can also assume that humans can take on very few tasks, such as "identifying a suspicious line in the log." If humans' ability to decompose tasks and share tasks are implemented, and these two assumptions are established, then we can build training signals for a huge task. These signals come from the combination of human signals for decomposing tasks.

Iterative amplification mechanism

In the actual process of training amplification, researchers first train the AI ​​system to learn from a small subtask, and learn to solve this sub-problem by seeking human help (label/reward signal). Then let the system learn a larger problem. At this time, humans are required to decompose larger tasks. The AI ​​system relies on the previous step of learning to solve these problems. Researchers apply this solution to those slightly difficult problems, in which the system gets training signals from humans to directly train secondary tasks (without human help at this time).

As the training progresses, researchers continue to provide AI with more complex and compound tasks and continue to construct training signals. If this process is completed, the AI ​​system will learn to solve highly complex problems, even though the system did not get direct training signals from the task at the beginning.

This process is similar to the AlphaGo Zero expert iteration process to a certain extent, but an expert iteration strengthens the existing training signal, while iterative amplification builds the training signal from scratch. It is also very similar to some recent problem decomposition algorithms, but the difference is that it can be used for problems without previous training signals.

experiment

Previous experiments have shown that it is very difficult to directly use AI systems to solve problems that surpass human capabilities. At the same time, using humans as training signals also introduces complexity. So the researchers' first experiment was to try to amplify the training signal of the algorithm to verify the effectiveness of this method in simple tasks. It also limits the attention to supervised learning. The researchers tried five sample algorithm tasks. These five algorithm examples have specific mathematical expressions, but the researchers first eliminate the algorithm signals and use a step-by-step approach from simple to complex to solve them from scratch. Using the method of iterative amplification, training signals are learned indirectly from some subtasks that are not direct.

In the five tasks (arrangement, sequence assignment, wildcard matching, shortest path, finding union), the new method can achieve the same or even better results with the expression method.

In the absence of labels, the iterative amplification method achieves the same or even better results as supervised learning

The amplification method seeks to solve problems that surpass the direct cognition and ability of human beings. Through an iterative process, humans can provide indirect supervision signals. This work is also based on human feedback. By implementing a reward prediction system, the next version will include feedback from real humans. At present, researchers are only in the initial stage of exploration. As the research deepens and scales expand, it will bring new possibilities for many complex problems.

Human feedback

Capacitor Motor

Capacitor Motor,Furnace Capacitor,Start Capacitor,Capacitor Start Motor

Wentelon Micro-Motor Co.,Ltd. , https://www.wentelon.com