Commit | Line | Data |
---|---|---|
5a16a542 KL |
1 | Next-Gen Multimedia Standard - Proposed Design Document |
2 | ======================================================= | |
3 | ||
4 | ||
5 | Purpose | |
6 | ------- | |
7 | ||
8 | TODO: Crib text from the first message of | |
9 | https://gitlab.freedesktop.org/terminal-wg/specifications/issues/12 as | |
10 | to why people want images in their terminals. | |
11 | ||
12 | The same mechanism that can put raster-based images on the screen is | |
13 | easily generalizable to other media types such as vector-based images, | |
14 | animations, and embedded GUI widgets. This document is thus a | |
15 | "multimedia" proposal, not just "simple images". | |
16 | ||
17 | ||
18 | Acknowledgements | |
19 | ---------------- | |
20 | ||
21 | This proposal has been informed from the following prior work: | |
22 | ||
23 | * DEC VT300 series sixel graphics standard: | |
24 | https://vt100.net/docs/vt3xx-gp/chapter14.html | |
25 | ||
26 | * iTerm2 image protocol: | |
27 | https://iterm2.com/documentation-images.html | |
28 | ||
29 | * Kitty image protocol: | |
5172fe03 | 30 | https://sw.kovidgoyal.net/kitty/graphics-protocol.html |
5a16a542 KL |
31 | |
32 | * Jexer Terminal User Interface: | |
33 | https://gitlab.com/klamonte/jexer | |
34 | ||
35 | ||
36 | Design Goals - Core | |
37 | ------------------- | |
38 | ||
39 | The core ("must-have") design goals are: | |
40 | ||
41 | * Be easy to implement in existing terminals and applications: | |
42 | ||
43 | - Sacrifice "10%" of potential function to eliminate "90%" of | |
44 | implementation pain. "Less is more." | |
45 | ||
46 | - Be a strict superset of the existing iTerm2 and DEC sixel image | |
47 | solutions. One should be able to take an existing terminal or | |
48 | application that emits/consumes iTerm2 or sixel sequences, and | |
49 | only change the control sequence introducer/termination to achieve | |
50 | the same effect as a terminal/application that conforms with this | |
51 | standard. | |
52 | ||
53 | * Have no ambiguity. If two terminal or application developers can | |
54 | read this document and reach different conclusions on what should be | |
55 | on the screen, then an error exists in this document that must be | |
56 | fixed. | |
57 | ||
895440f3 | 58 | - Every feature must be straightforward to validate via automated |
5a16a542 KL |
59 | unit testing. |
60 | ||
895440f3 KL |
61 | - Every conformant terminal must produce the same output (pixels on |
62 | screen) given the same input (terminal font, terminal sequences). | |
5a16a542 KL |
63 | |
64 | - Every option must have a defined default value. | |
65 | ||
66 | - Erroneous sequences must have defined expected results. | |
67 | ||
68 | - Every operation must act atomically: either everything worked | |
69 | (image is on screen, cursor has moved, etc.) or nothing did. | |
70 | ||
71 | * Be straightforward to implement in non-"physical" terminals, | |
72 | including: | |
73 | ||
74 | - Future versions of terminal control libraries such as ncurses and | |
75 | termbox. | |
76 | ||
77 | - Terminal multiplexers that support "headless" terminals (no | |
78 | physical screen) and "multi-head" terminals (many different | |
79 | physical screens). | |
80 | ||
81 | * Be platform-agnostic, and easy to implement on (at the least): | |
82 | POSIX, Windows, and web. | |
83 | ||
84 | - All features must be available even if the only means of | |
85 | communication between the application and terminal is control | |
86 | sequences (e.g. no shared disk, no shared memory, no shared DOM, | |
87 | etc.). | |
88 | ||
89 | * Support graceful fallback: | |
90 | ||
91 | - Terminal emulators and physical terminals that do not support this | |
92 | standard should remain usable with no undefined screen artifacts, | |
93 | even when the application blindly emits these sequences to those | |
94 | terminals. | |
95 | ||
96 | - This standard must able to be versioned for future enhancements. | |
97 | ||
98 | - An application must be able to detect that its terminal supports | |
99 | this standard, and at what version. | |
100 | ||
101 | * Support secure programming practices: | |
102 | ||
103 | - Applications must not be able to obtain unauthorized data from | |
104 | terminal memory, such as: images emitted by other applications | |
105 | still present in the terminal's scrollback buffer, terminal or | |
106 | system memory limits. | |
107 | ||
108 | - Applications must not be able to compromise the terminal through | |
109 | denial-of-service such as: excessive memory usage, unterminated | |
110 | control sequences. Similarly, terminals must not be able to | |
111 | compromise application through their responses to application | |
112 | queries. | |
113 | ||
114 | - Applications must not be able to manipulate the terminal into | |
115 | performing an insecure operation such as: reading arbitrary shared | |
116 | memory regions, reading arbitrary files on disk, deleting | |
117 | arbitrary files on disk, etc. Similarly, terminals must not be | |
118 | able to manipulate applications into performing insecure | |
119 | operations. | |
120 | ||
121 | - This standard must be implementable when the terminal has a fixed | |
122 | maximum memory, such as a kernel-level device driver. | |
123 | ||
124 | ||
125 | ||
126 | Design Goals - Secondary | |
127 | ------------------------ | |
128 | ||
5172fe03 KL |
129 | The secondary ("nice-to-have") design goals are listed below. These |
130 | might not all be possible, but will kept in mind: | |
5a16a542 KL |
131 | |
132 | * Minimal redundant network traffic for on-screen data that is | |
133 | repeated: either on screen in multiple places, or in the same place | |
134 | but refreshed multiple times. | |
135 | ||
136 | * Asynchronous notification from terminal to application that the | |
137 | screen has been changed by outside or user action. Examples: font | |
138 | change, session detach/attach, user changed image preferences. | |
139 | ||
5172fe03 KL |
140 | * The ability for a multiplexer to "pass-thru" the image drawing |
141 | sequence to its "outer" terminal, with some support for limited | |
142 | clipping. | |
143 | ||
144 | ||
5a16a542 KL |
145 | |
146 | Out Of Scope | |
147 | ------------ | |
148 | ||
149 | The following items are out of scope for this standard: | |
150 | ||
151 | * Bidirectional output. Applications are expected to generate Tiles | |
152 | and place them on screen where they need. The cursor response to | |
153 | image sequences are defined as left-to-right, consistent with | |
154 | ECMA-48 / ANSI X3.64 sequences. An independent BIDI standard is | |
155 | free to apply whatever solution will work for ECMA-48 sequences to | |
156 | the sequences described in this document. | |
157 | ||
158 | * Capabilities. This standard defines a limited number of terminal | |
159 | reports. These are not intended to be used as a general-purpose | |
160 | capabilities model. | |
161 | ||
162 | ||
163 | ||
164 | Definitions | |
165 | ----------- | |
166 | ||
167 | Terminal - The hardware, or a program that simulates hardware, | |
168 | comprising a keyboard, screen, and mouse. | |
169 | ||
170 | Application - A program that utilizes the terminal for its | |
171 | input/output with the user. | |
172 | ||
173 | Multiplexer - A special case of an application that simulates one or | |
174 | more "inner" terminals for other applications to use, | |
175 | and composes these inner terminals into a combined | |
176 | screen to emit to one or more "outer" terminals that | |
177 | obtain input/output from the user. Multiplexers are | |
178 | thus both applications and terminals. | |
179 | ||
5172fe03 KL |
180 | X - The column coordinate of a cell. This standard is 1-based (like |
181 | ECMA-48): the left-most column of the screen is numbered 1. | |
5a16a542 | 182 | |
5172fe03 KL |
183 | Y - The row coordinate of a cell. This standard is 1-based (like |
184 | ECMA-48): the top-most row of the screen is numbered 1. | |
5a16a542 KL |
185 | |
186 | Z - The layer that text or multimedia is placed on. This proposal | |
5172fe03 | 187 | uses a right-hand coordinate system with (X, Y, Z) = (1, 1, 1) |
5a16a542 KL |
188 | defined as the top-left corner on the default layer: positive Z |
189 | projects "away" from the user and "into" or "behind" the screen. | |
190 | Rendering the Cells on the screen must produce the same result as | |
191 | painter's algorithm (see Rendering section below). | |
192 | ||
193 | Cell - A fixed-width-and-height rectangle on the screen. The cells of | |
194 | the screen are arranged in a grid of X columns and Y rows. A | |
195 | Cell has dimensions of cellWidth and cellHeight, which can be | |
196 | measured in either pixels or points. Every Cell has a | |
197 | coordinate of (X, Y, Z). | |
198 | ||
199 | Tile - One or more contiguous Cells with data to be displayed. The | |
200 | data can be text or image data, but not both. A Tile has width | |
201 | of 1, 2, or more, and a coordinate of (X, Y, Z) that is the | |
202 | same as its left-most (first) Cell's (X, Y, Z). In practice, | |
203 | Tiles are typically one Cell wide for ASCII and Latin language | |
204 | glyphs, and two Cells wide for "fullwidth" glyphs as used in | |
205 | Asian langauges, emojis, and symbols. This standard does not | |
206 | preclude Tiles from encompassing entire grapheme clusters. | |
207 | ||
208 | Layer - A screen-sized grid of Cells that have the same Z coordinate. | |
209 | Layers are drawn to the screen in descending Z order. Layers | |
210 | may have optional additional attributes such as transparency. | |
211 | ||
212 | ||
213 | Rendering | |
214 | --------- | |
215 | ||
216 | A terminal will display its Cells such that the screen will look as if | |
217 | it was rendered in the following pseudo-code manner: | |
218 | ||
219 | ``` | |
220 | for each layer Z, in descending order from maxZ to minZ: | |
221 | for each row Y, in ascending order from minY to maxY: | |
222 | for each column X, in ascending order from minX to maxX: | |
223 | draw tile at (X, Y, Z) | |
224 | advance X by tile width | |
225 | next column | |
226 | advance Y by 1 | |
227 | next row | |
228 | decrease Z by 1 | |
229 | next layer | |
230 | ``` | |
231 | ||
232 | A terminal is free to optimize its rendering as it sees fit, so long | |
233 | as the final screen output looks equivalent to the above method. | |
234 | ||
235 | ||
236 | ||
237 | Terminal State | |
238 | -------------- | |
239 | ||
240 | ||
241 | ||
242 | Terminal Reports | |
243 | ---------------- | |
244 | ||
245 | ||
246 | ||
247 | Error Handling | |
248 | -------------- | |
249 | ||
250 | ||
251 | ||
252 | Cursor Position | |
253 | --------------- | |
254 | ||
255 | ||
256 | ||
257 | ||
258 | Wire Formats | |
259 | ------------ | |
260 | ||
261 | ||
262 | ||
263 | ||
264 | Optimizations | |
265 | ------------- | |
266 | ||
267 | ||
268 | ||
269 | Examples | |
270 | -------- | |
271 | ||
272 |