| 1 | Next-Gen Multimedia Standard - Proposed Design Document |
| 2 | ======================================================= |
| 3 | |
| 4 | |
| 5 | Purpose |
| 6 | ------- |
| 7 | |
| 8 | TODO: Crib text from the first message of |
| 9 | https://gitlab.freedesktop.org/terminal-wg/specifications/issues/12 as |
| 10 | to why people want images in their terminals. |
| 11 | |
| 12 | The same mechanism that can put raster-based images on the screen is |
| 13 | easily generalizable to other media types such as vector-based images, |
| 14 | animations, and embedded GUI widgets. This document is thus a |
| 15 | "multimedia" proposal, not just "simple images". |
| 16 | |
| 17 | |
| 18 | Acknowledgements |
| 19 | ---------------- |
| 20 | |
| 21 | This proposal has been informed from the following prior work: |
| 22 | |
| 23 | * DEC VT300 series sixel graphics standard: |
| 24 | https://vt100.net/docs/vt3xx-gp/chapter14.html |
| 25 | |
| 26 | * iTerm2 image protocol: |
| 27 | https://iterm2.com/documentation-images.html |
| 28 | |
| 29 | * Kitty image protocol: |
| 30 | https://sw.kovidgoyal.net/kitty/graphics-protocol.html |
| 31 | |
| 32 | * Jexer Terminal User Interface: |
| 33 | https://gitlab.com/klamonte/jexer |
| 34 | |
| 35 | |
| 36 | Design Goals - Core |
| 37 | ------------------- |
| 38 | |
| 39 | The core ("must-have") design goals are: |
| 40 | |
| 41 | * Be easy to implement in existing terminals and applications: |
| 42 | |
| 43 | - Sacrifice "10%" of potential function to eliminate "90%" of |
| 44 | implementation pain. "Less is more." |
| 45 | |
| 46 | - Be a strict superset of the existing iTerm2 and DEC sixel image |
| 47 | solutions. One should be able to take an existing terminal or |
| 48 | application that emits/consumes iTerm2 or sixel sequences, and |
| 49 | only change the control sequence introducer/termination to achieve |
| 50 | the same effect as a terminal/application that conforms with this |
| 51 | standard. |
| 52 | |
| 53 | * Have no ambiguity. If two terminal or application developers can |
| 54 | read this document and reach different conclusions on what should be |
| 55 | on the screen, then an error exists in this document that must be |
| 56 | fixed. |
| 57 | |
| 58 | - Every feature must be straightforward to validate via automated |
| 59 | unit testing. |
| 60 | |
| 61 | - Every conformant terminal must produce the same output (pixels on |
| 62 | screen) given the same input (terminal font, terminal sequences). |
| 63 | |
| 64 | - Every option must have a defined default value. |
| 65 | |
| 66 | - Erroneous sequences must have defined expected results. |
| 67 | |
| 68 | - Every operation must act atomically: either everything worked |
| 69 | (image is on screen, cursor has moved, etc.) or nothing did. |
| 70 | |
| 71 | * Be straightforward to implement in non-"physical" terminals, |
| 72 | including: |
| 73 | |
| 74 | - Future versions of terminal control libraries such as ncurses and |
| 75 | termbox. |
| 76 | |
| 77 | - Terminal multiplexers that support "headless" terminals (no |
| 78 | physical screen) and "multi-head" terminals (many different |
| 79 | physical screens). |
| 80 | |
| 81 | * Be platform-agnostic, and easy to implement on (at the least): |
| 82 | POSIX, Windows, and web. |
| 83 | |
| 84 | - All features must be available even if the only means of |
| 85 | communication between the application and terminal is control |
| 86 | sequences (e.g. no shared disk, no shared memory, no shared DOM, |
| 87 | etc.). |
| 88 | |
| 89 | * Support graceful fallback: |
| 90 | |
| 91 | - Terminal emulators and physical terminals that do not support this |
| 92 | standard should remain usable with no undefined screen artifacts, |
| 93 | even when the application blindly emits these sequences to those |
| 94 | terminals. |
| 95 | |
| 96 | - This standard must able to be versioned for future enhancements. |
| 97 | |
| 98 | - An application must be able to detect that its terminal supports |
| 99 | this standard, and at what version. |
| 100 | |
| 101 | * Support secure programming practices: |
| 102 | |
| 103 | - Applications must not be able to obtain unauthorized data from |
| 104 | terminal memory, such as: images emitted by other applications |
| 105 | still present in the terminal's scrollback buffer, terminal or |
| 106 | system memory limits. |
| 107 | |
| 108 | - Applications must not be able to compromise the terminal through |
| 109 | denial-of-service such as: excessive memory usage, unterminated |
| 110 | control sequences. Similarly, terminals must not be able to |
| 111 | compromise application through their responses to application |
| 112 | queries. |
| 113 | |
| 114 | - Applications must not be able to manipulate the terminal into |
| 115 | performing an insecure operation such as: reading arbitrary shared |
| 116 | memory regions, reading arbitrary files on disk, deleting |
| 117 | arbitrary files on disk, etc. Similarly, terminals must not be |
| 118 | able to manipulate applications into performing insecure |
| 119 | operations. |
| 120 | |
| 121 | - This standard must be implementable when the terminal has a fixed |
| 122 | maximum memory, such as a kernel-level device driver. |
| 123 | |
| 124 | |
| 125 | |
| 126 | Design Goals - Secondary |
| 127 | ------------------------ |
| 128 | |
| 129 | The secondary ("nice-to-have") design goals are listed below. These |
| 130 | might not all be possible, but will kept in mind: |
| 131 | |
| 132 | * Minimal redundant network traffic for on-screen data that is |
| 133 | repeated: either on screen in multiple places, or in the same place |
| 134 | but refreshed multiple times. |
| 135 | |
| 136 | * Asynchronous notification from terminal to application that the |
| 137 | screen has been changed by outside or user action. Examples: font |
| 138 | change, session detach/attach, user changed image preferences. |
| 139 | |
| 140 | * The ability for a multiplexer to "pass-thru" the image drawing |
| 141 | sequence to its "outer" terminal, with some support for limited |
| 142 | clipping. |
| 143 | |
| 144 | |
| 145 | |
| 146 | Out Of Scope |
| 147 | ------------ |
| 148 | |
| 149 | The following items are out of scope for this standard: |
| 150 | |
| 151 | * Bidirectional output. Applications are expected to generate Tiles |
| 152 | and place them on screen where they need. The cursor response to |
| 153 | image sequences are defined as left-to-right, consistent with |
| 154 | ECMA-48 / ANSI X3.64 sequences. An independent BIDI standard is |
| 155 | free to apply whatever solution will work for ECMA-48 sequences to |
| 156 | the sequences described in this document. |
| 157 | |
| 158 | * Capabilities. This standard defines a limited number of terminal |
| 159 | reports. These are not intended to be used as a general-purpose |
| 160 | capabilities model. |
| 161 | |
| 162 | |
| 163 | |
| 164 | Definitions |
| 165 | ----------- |
| 166 | |
| 167 | Terminal - The hardware, or a program that simulates hardware, |
| 168 | comprising a keyboard, screen, and mouse. |
| 169 | |
| 170 | Application - A program that utilizes the terminal for its |
| 171 | input/output with the user. |
| 172 | |
| 173 | Multiplexer - A special case of an application that simulates one or |
| 174 | more "inner" terminals for other applications to use, |
| 175 | and composes these inner terminals into a combined |
| 176 | screen to emit to one or more "outer" terminals that |
| 177 | obtain input/output from the user. Multiplexers are |
| 178 | thus both applications and terminals. |
| 179 | |
| 180 | X - The column coordinate of a cell. This standard is 1-based (like |
| 181 | ECMA-48): the left-most column of the screen is numbered 1. |
| 182 | |
| 183 | Y - The row coordinate of a cell. This standard is 1-based (like |
| 184 | ECMA-48): the top-most row of the screen is numbered 1. |
| 185 | |
| 186 | Z - The layer that text or multimedia is placed on. This proposal |
| 187 | uses a right-hand coordinate system with (X, Y, Z) = (1, 1, 1) |
| 188 | defined as the top-left corner on the default layer: positive Z |
| 189 | projects "away" from the user and "into" or "behind" the screen. |
| 190 | Rendering the Cells on the screen must produce the same result as |
| 191 | painter's algorithm (see Rendering section below). |
| 192 | |
| 193 | Cell - A fixed-width-and-height rectangle on the screen. The cells of |
| 194 | the screen are arranged in a grid of X columns and Y rows. A |
| 195 | Cell has dimensions of cellWidth and cellHeight, which can be |
| 196 | measured in either pixels or points. Every Cell has a |
| 197 | coordinate of (X, Y, Z). |
| 198 | |
| 199 | Tile - One or more contiguous Cells with data to be displayed. The |
| 200 | data can be text or image data, but not both. A Tile has width |
| 201 | of 1, 2, or more, and a coordinate of (X, Y, Z) that is the |
| 202 | same as its left-most (first) Cell's (X, Y, Z). In practice, |
| 203 | Tiles are typically one Cell wide for ASCII and Latin language |
| 204 | glyphs, and two Cells wide for "fullwidth" glyphs as used in |
| 205 | Asian langauges, emojis, and symbols. This standard does not |
| 206 | preclude Tiles from encompassing entire grapheme clusters. |
| 207 | |
| 208 | Layer - A screen-sized grid of Cells that have the same Z coordinate. |
| 209 | Layers are drawn to the screen in descending Z order. Layers |
| 210 | may have optional additional attributes such as transparency. |
| 211 | |
| 212 | |
| 213 | Rendering |
| 214 | --------- |
| 215 | |
| 216 | A terminal will display its Cells such that the screen will look as if |
| 217 | it was rendered in the following pseudo-code manner: |
| 218 | |
| 219 | ``` |
| 220 | for each layer Z, in descending order from maxZ to minZ: |
| 221 | for each row Y, in ascending order from minY to maxY: |
| 222 | for each column X, in ascending order from minX to maxX: |
| 223 | draw tile at (X, Y, Z) |
| 224 | advance X by tile width |
| 225 | next column |
| 226 | advance Y by 1 |
| 227 | next row |
| 228 | decrease Z by 1 |
| 229 | next layer |
| 230 | ``` |
| 231 | |
| 232 | A terminal is free to optimize its rendering as it sees fit, so long |
| 233 | as the final screen output looks equivalent to the above method. |
| 234 | |
| 235 | |
| 236 | |
| 237 | Terminal State |
| 238 | -------------- |
| 239 | |
| 240 | |
| 241 | |
| 242 | Terminal Reports |
| 243 | ---------------- |
| 244 | |
| 245 | |
| 246 | |
| 247 | Error Handling |
| 248 | -------------- |
| 249 | |
| 250 | |
| 251 | |
| 252 | Cursor Position |
| 253 | --------------- |
| 254 | |
| 255 | |
| 256 | |
| 257 | |
| 258 | Wire Formats |
| 259 | ------------ |
| 260 | |
| 261 | |
| 262 | |
| 263 | |
| 264 | Optimizations |
| 265 | ------------- |
| 266 | |
| 267 | |
| 268 | |
| 269 | Examples |
| 270 | -------- |
| 271 | |
| 272 | |