Run-time support for heterogeneous multitasking on reconfigurable SoCs

隱藏之投影片 Run-time support for heterogeneous multitasking on reconfigurable SoCs To be studied : 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C ★★★ Xen ???

一、Abstract • In complex reconfigurable systems on chip (SoC), the dynamism of applications requires an efficient management of the platform. • To allow run-time management of heterogeneous resources, operatingsystems (OS) and reconfigurable SoC platforms should be developed together. • We show that networks-on-chip (NoC) are an ideal communication layer for dynamically reconfigurable SoCs. • Explain how our OS provides run-time support for dynamic task relocation. • Detail how hardware parts of the OS are integrated into the higher layers of the NoC.

二、Introduction • In order to meet the ever-increasing design complexity, future sub-100 nm platforms will consist of a mixture of heterogeneous computing resources, storage elements, hardware accelerators, etc. (denoted as tiles). • These programmable/reconfigurable tiles will be interconnected by networks-on-chip (NoC). • The integration of these heterogeneous resources into a single tile-based system, illustrated by Fig. 1

二、Introduction(續) • Obviously, the managementinfrastructure should ease application development by shielding the programmer from thecomplexity of the system and by providing a clear application development interface. • In addition, this infrastructure is responsiblefor monitoring and managing the available resources in a consistent, efficient and fair way. Management infrastructure motivation: • Two distinct approaches of OS management can be identified: • 1.Treating computing tiles as peripheral devices • In this case the host OS typically provides devicedriver support allowing applications to directly access these devices, which should ensuremaximum performance. • 2.Treating computing tiles as regular computing resources • The OS deals with the low-level aspectsand the management of the heterogeneous resources, allowing the application designer toconcentrate on the application’s functionality.

二、Introduction(續) • We believe it is timely to develop an integrated management infrastructure that enables the fullpotential of the heterogeneous multiprocessor system, illustrated by Fig. 2. • Such an infrastructureshould provide a greater ease-of-use and consistency by providing a suited abstraction for theheterogeneous computing resources. • In addition, it should enable sharing the computing resourcesamong multiple, unrelated applications.

二、Introduction(續) • As Fig. 2 illustrates, the biggest part of our operating system for reconfigurable systems (OS4RS)executes on top of the instruction set processor (master ISP). • However, depending on the amountof hardware support for the OS , a small amount of low-level management functionseffectively ‘executes’ on top of the slave processing units. • Our OS can be classified as a master–slave configuration. This implies that one processor unit (the master ISP) is responsible for monitoring the status of the heterogeneous system and for assigning work to all the other processing units (slaves).

三、OS components for the targeted system OS4RS task descriptors: • The OS keeps track of the tasks by means of a task descriptor list. • This list contains a descriptor for every OS4RS task instantiation. • The most important descriptor components are: • The task logical ID. • This unique ID is assigned by OS4RS at task initialization time and allows addressing tasks, independent of their location (i.e. current computing unit) within the system. • The task state. • This allows, for example, to indicate that a certain task has been assigned to a computing resource for execution, or that a task has been selected for relocation to a different computing resource. • A link to the task destination look-up table (DLT). • A list containing the available execution binaries and their respective properties, targeted at the different heterogeneous computing resources • A link to the computing unit descriptor that currently executes the task.

三、OS components for the targeted system(續) OS4RS computing unit descriptors • The operating system manages its computing resources by maintaining a linked list ofcomputing unit descriptors. • Every computing unit descriptor contains a set of 11 functions thatcompletely describes the capabilities of the computing resource. • The most important functionsallow the operating system to: • 1.Set up a task. • This function requires the task logical ID and the task binary as input from theOS. • 2.Initialize a task. • This function allows the OS to reset a task and, if necessary, initialize it with apreviously captured task state. • 3.Start a task. • This function allows the OS to start a task with a specified logical ID.

三、OS components for the targeted system(續) • 4.Remove a task. • This function allows the OS to remove a task with a specified logical ID. • 5.Signal a task. • This functions allows the OS to send a switch signal to a certain task, to suspend/resume a task, etc. • This function requires a signal identifier and a task logical ID. • 6.Control inter-task communication. • This function allows the OS to initialize/update the DLT of a specified task. • 7.Handle computing resource exceptions. • This is a call-back function that allows the computingresource to signal exceptions to the OS. • Furthermore, the computing unit descriptor allows the OS to monitor the state of thecomputing unit through a number of variables • like the load of the computing unit, the number ofrunning tasks, the task set up time, etc.

三、OS components for the targeted system(續) OS4RS task scheduling: • The OS employs a dynamic two-level scheduling technique, (illustrated by Fig. 3.) • The top-levelscheduler assigns tasks to computing resources in response to timer ticks and external events (e.g.user interaction). • This mapping is based on the information that resides in theOS4RS task descriptors and the computing unitdescriptors. • The top-level scheduler dispatches a task to a local scheduler by instantiating this task on acertain computing unit. • The top-level scheduler is able to move tasks between heterogeneous computing resourcesby using task-specific contexts. • The local scheduler, tied to a certain computing unit, is responsible for thetemporal ordering of the tasks that have been assigned to that computing unit.

三、OS components for the targeted system(續)

三、OS components for the targeted system(續) OS4RS inter-task communication: • By using a uniform communication scheme for all tasks (independent of their mapping), relocating at run-time a certain task between heterogeneous computing resources does not affect the way other tasks communicate with it. • This uniform communication scheme is based onmessage passing, since this communication type is supported by the underlying hardware. • To support point-to-multipoint and multipoint-to-point communication, we have introduced the notion of input and output ports for the tasks. • Messages sent by one task to two other tasks are distinguished by the output port number they are being sent on.

三、OS components for the targeted system(續) • During application initialization, for every task in the application, the OS assigns a system-wide unique logical ID. • The top-level scheduler (Fig. 3) maps the task on a computing resource, which determines its physical address on the platform. • In addition, the application provides the OS with a task graph detailing the application’s inter-task communication (Fig. 4).

三、OS components for the targeted system(續) • Thus, for every output port of a task the OS defines a triplet • (destination input port, destination logic address, destination physical address). • For instance, task C has two output ports, hence is assigned two triplets, which compose its DLT (Fig. 5). • Whenever a task gets instantiated, the OS updates its associated DLT and sends it to the computing resource responsible for the task execution.

四、OS hardware support • The hardware support allows the OS to perform some low-level management functions in a more efficient way. • This section explains how an ensemble of NoCs and interfaces to the computations resources implement an efficient communication layer and support the OS in three distinct domains: • (1)Task management. • In order to instantiate/remove tasks from a certain computing resource, the OS requires efficient access to its configuration/programming mechanism. • (2)Inter-task communication. • A better solution would be to allow the different (slave) computing resources to communicate with each other in a way controlled by the OS, but without having to pass through the main ISP. • (3)Operation and Management (OAM). • The OS needs to keep track of the behavior of the different tasks executing on all the computing resources in terms of communication and security.

五、Description of the application life cycle Task life cycle • Whenever a user starts an application, the OS needs to perform a series of actions before the application actually starts running on the heterogeneous reconfigurable SoC. • Three steps can be identified (Fig. 7): • (1) Loading the application • This step creates a task structure containing a unique logical ID for every task within the application. • This is done by registering a communication task graph . • Based on this task graph, the OS creates a DLT for every task. • (2) Allocate tasks to platform computing resources • In this step the operating system decides on the mapping of the application tasks depending on their available representation, the availability of computing resources, the requested QoS, etc.

五、Description of the application life cycle(續) • (3) Instantiate • initialize and start the application tasks. • This entails (for every task) sending theconfiguration/program data to the computing resource, reset/initialization of the tasks, updatingthe DLT and sending it to the computing resource and finally start the task.

五、Description of the application life cycle(續) • The OS needs the ability to relocate (migrate) a task from one heterogeneous computing resource (origin tile) to another (destination tile). • The principle of the relocation process is depicted by Fig. 8. • (1)In order to relocate a task, the OS can send a switch signal to that task, at any time • (2)Whenever that signaled task reaches a switch point, it goes into an interrupted state. • (3)In this state, all the relevant state information of that switch point is transferred to the OS. • (4)Consequently, the OS will instantiate that task onto a different computing resource. The task will be initialized using the state information previously stored by the OS. • (5)The tasks resumes by continuing execution in the corresponding switch point.

五、Description of the application life cycle(續)

五、Description of the application life cycle(續) • Because parts of the OS are distributed in different components over the platform, the actual switch process involves concurrent steps and requires synchronization of communication. • The different steps performed in the actual switch process are described in more detail in Fig.9 . • (1)When the OS sends a switch signal to the origin tile , the task running on that tile may be in a state that requires more input data before reaching a switch point. • Therefore, when the task reaches its switch point and signals it to the OS (1→2).

五、Description of the application life cycle(續) • (2)After reception of the acknowledgment that the task has reached its switch point, the OSrequests the sender tile to send one last tagged message to the origin tile and then stop sendingfurther messages. • (3)The OS then configures the destination tile with the switched task, initializesit to the state it stopped in and enables its communications on the data NoC. • (4)The next step isto forward all pending messages to the newly relocated task. • (4)To this end, the OS sends a new DLT to the origin tile and puts its data NIC in a special state that forwards all its input messages to thedestination tile.

五、Description of the application life cycle(續) • The tagged message is the last one to be forwarded by the data NIC of the origin tile to the destination tile. The OS is informed by the data NIC of the destination tile upon reception of the tagged message (4→5) • (5)The task switching mechanism is finished and the OS can simply update the DLT of the sender tile to point to the destination tile in place of the origin tile and re-enable its communication on the data NoC.

六、conclusion • This paper presents our work on the run-time support of heterogeneous multitasking onreconfigurable SoCs. • We show that an OS for reconfigurable systems should to be designedtogether with the communication layer of the platform. • Our architecture for reconfigurable SoCsis composed of a master ISP connected to heterogeneous reconfigurable tiles using an ensemble ofNoCs as communication layer.

Run-time support for heterogeneous multitasking on reconfigurable SoCs