Commit | Line | Data |
---|---|---|
5a16a542 KL |
1 | Next-Gen Multimedia Standard - Proposed Design Document |
2 | ======================================================= | |
3 | ||
4 | ||
5 | Purpose | |
6 | ------- | |
7 | ||
8 | TODO: Crib text from the first message of | |
9 | https://gitlab.freedesktop.org/terminal-wg/specifications/issues/12 as | |
10 | to why people want images in their terminals. | |
11 | ||
12 | The same mechanism that can put raster-based images on the screen is | |
13 | easily generalizable to other media types such as vector-based images, | |
14 | animations, and embedded GUI widgets. This document is thus a | |
15 | "multimedia" proposal, not just "simple images". | |
16 | ||
17 | ||
18 | Acknowledgements | |
19 | ---------------- | |
20 | ||
21 | This proposal has been informed from the following prior work: | |
22 | ||
23 | * DEC VT300 series sixel graphics standard: | |
24 | https://vt100.net/docs/vt3xx-gp/chapter14.html | |
25 | ||
26 | * iTerm2 image protocol: | |
27 | https://iterm2.com/documentation-images.html | |
28 | ||
29 | * Kitty image protocol: | |
5172fe03 | 30 | https://sw.kovidgoyal.net/kitty/graphics-protocol.html |
5a16a542 KL |
31 | |
32 | * Jexer Terminal User Interface: | |
33 | https://gitlab.com/klamonte/jexer | |
34 | ||
35 | ||
36 | Design Goals - Core | |
37 | ------------------- | |
38 | ||
39 | The core ("must-have") design goals are: | |
40 | ||
41 | * Be easy to implement in existing terminals and applications: | |
42 | ||
43 | - Sacrifice "10%" of potential function to eliminate "90%" of | |
44 | implementation pain. "Less is more." | |
45 | ||
46 | - Be a strict superset of the existing iTerm2 and DEC sixel image | |
47 | solutions. One should be able to take an existing terminal or | |
48 | application that emits/consumes iTerm2 or sixel sequences, and | |
49 | only change the control sequence introducer/termination to achieve | |
50 | the same effect as a terminal/application that conforms with this | |
51 | standard. | |
52 | ||
53 | * Have no ambiguity. If two terminal or application developers can | |
54 | read this document and reach different conclusions on what should be | |
55 | on the screen, then an error exists in this document that must be | |
56 | fixed. | |
57 | ||
58 | - Every feature should be straightforward to validate via automated | |
59 | unit testing. | |
60 | ||
61 | - Every conformant terminal should produce the same output (pixels | |
62 | on screen) given the same input (terminal font, terminal | |
63 | sequences). | |
64 | ||
65 | - Every option must have a defined default value. | |
66 | ||
67 | - Erroneous sequences must have defined expected results. | |
68 | ||
69 | - Every operation must act atomically: either everything worked | |
70 | (image is on screen, cursor has moved, etc.) or nothing did. | |
71 | ||
72 | * Be straightforward to implement in non-"physical" terminals, | |
73 | including: | |
74 | ||
75 | - Future versions of terminal control libraries such as ncurses and | |
76 | termbox. | |
77 | ||
78 | - Terminal multiplexers that support "headless" terminals (no | |
79 | physical screen) and "multi-head" terminals (many different | |
80 | physical screens). | |
81 | ||
82 | * Be platform-agnostic, and easy to implement on (at the least): | |
83 | POSIX, Windows, and web. | |
84 | ||
85 | - All features must be available even if the only means of | |
86 | communication between the application and terminal is control | |
87 | sequences (e.g. no shared disk, no shared memory, no shared DOM, | |
88 | etc.). | |
89 | ||
90 | * Support graceful fallback: | |
91 | ||
92 | - Terminal emulators and physical terminals that do not support this | |
93 | standard should remain usable with no undefined screen artifacts, | |
94 | even when the application blindly emits these sequences to those | |
95 | terminals. | |
96 | ||
97 | - This standard must able to be versioned for future enhancements. | |
98 | ||
99 | - An application must be able to detect that its terminal supports | |
100 | this standard, and at what version. | |
101 | ||
102 | * Support secure programming practices: | |
103 | ||
104 | - Applications must not be able to obtain unauthorized data from | |
105 | terminal memory, such as: images emitted by other applications | |
106 | still present in the terminal's scrollback buffer, terminal or | |
107 | system memory limits. | |
108 | ||
109 | - Applications must not be able to compromise the terminal through | |
110 | denial-of-service such as: excessive memory usage, unterminated | |
111 | control sequences. Similarly, terminals must not be able to | |
112 | compromise application through their responses to application | |
113 | queries. | |
114 | ||
115 | - Applications must not be able to manipulate the terminal into | |
116 | performing an insecure operation such as: reading arbitrary shared | |
117 | memory regions, reading arbitrary files on disk, deleting | |
118 | arbitrary files on disk, etc. Similarly, terminals must not be | |
119 | able to manipulate applications into performing insecure | |
120 | operations. | |
121 | ||
122 | - This standard must be implementable when the terminal has a fixed | |
123 | maximum memory, such as a kernel-level device driver. | |
124 | ||
125 | ||
126 | ||
127 | Design Goals - Secondary | |
128 | ------------------------ | |
129 | ||
5172fe03 KL |
130 | The secondary ("nice-to-have") design goals are listed below. These |
131 | might not all be possible, but will kept in mind: | |
5a16a542 KL |
132 | |
133 | * Minimal redundant network traffic for on-screen data that is | |
134 | repeated: either on screen in multiple places, or in the same place | |
135 | but refreshed multiple times. | |
136 | ||
137 | * Asynchronous notification from terminal to application that the | |
138 | screen has been changed by outside or user action. Examples: font | |
139 | change, session detach/attach, user changed image preferences. | |
140 | ||
5172fe03 KL |
141 | * The ability for a multiplexer to "pass-thru" the image drawing |
142 | sequence to its "outer" terminal, with some support for limited | |
143 | clipping. | |
144 | ||
145 | ||
5a16a542 KL |
146 | |
147 | Out Of Scope | |
148 | ------------ | |
149 | ||
150 | The following items are out of scope for this standard: | |
151 | ||
152 | * Bidirectional output. Applications are expected to generate Tiles | |
153 | and place them on screen where they need. The cursor response to | |
154 | image sequences are defined as left-to-right, consistent with | |
155 | ECMA-48 / ANSI X3.64 sequences. An independent BIDI standard is | |
156 | free to apply whatever solution will work for ECMA-48 sequences to | |
157 | the sequences described in this document. | |
158 | ||
159 | * Capabilities. This standard defines a limited number of terminal | |
160 | reports. These are not intended to be used as a general-purpose | |
161 | capabilities model. | |
162 | ||
163 | ||
164 | ||
165 | Definitions | |
166 | ----------- | |
167 | ||
168 | Terminal - The hardware, or a program that simulates hardware, | |
169 | comprising a keyboard, screen, and mouse. | |
170 | ||
171 | Application - A program that utilizes the terminal for its | |
172 | input/output with the user. | |
173 | ||
174 | Multiplexer - A special case of an application that simulates one or | |
175 | more "inner" terminals for other applications to use, | |
176 | and composes these inner terminals into a combined | |
177 | screen to emit to one or more "outer" terminals that | |
178 | obtain input/output from the user. Multiplexers are | |
179 | thus both applications and terminals. | |
180 | ||
5172fe03 KL |
181 | X - The column coordinate of a cell. This standard is 1-based (like |
182 | ECMA-48): the left-most column of the screen is numbered 1. | |
5a16a542 | 183 | |
5172fe03 KL |
184 | Y - The row coordinate of a cell. This standard is 1-based (like |
185 | ECMA-48): the top-most row of the screen is numbered 1. | |
5a16a542 KL |
186 | |
187 | Z - The layer that text or multimedia is placed on. This proposal | |
5172fe03 | 188 | uses a right-hand coordinate system with (X, Y, Z) = (1, 1, 1) |
5a16a542 KL |
189 | defined as the top-left corner on the default layer: positive Z |
190 | projects "away" from the user and "into" or "behind" the screen. | |
191 | Rendering the Cells on the screen must produce the same result as | |
192 | painter's algorithm (see Rendering section below). | |
193 | ||
194 | Cell - A fixed-width-and-height rectangle on the screen. The cells of | |
195 | the screen are arranged in a grid of X columns and Y rows. A | |
196 | Cell has dimensions of cellWidth and cellHeight, which can be | |
197 | measured in either pixels or points. Every Cell has a | |
198 | coordinate of (X, Y, Z). | |
199 | ||
200 | Tile - One or more contiguous Cells with data to be displayed. The | |
201 | data can be text or image data, but not both. A Tile has width | |
202 | of 1, 2, or more, and a coordinate of (X, Y, Z) that is the | |
203 | same as its left-most (first) Cell's (X, Y, Z). In practice, | |
204 | Tiles are typically one Cell wide for ASCII and Latin language | |
205 | glyphs, and two Cells wide for "fullwidth" glyphs as used in | |
206 | Asian langauges, emojis, and symbols. This standard does not | |
207 | preclude Tiles from encompassing entire grapheme clusters. | |
208 | ||
209 | Layer - A screen-sized grid of Cells that have the same Z coordinate. | |
210 | Layers are drawn to the screen in descending Z order. Layers | |
211 | may have optional additional attributes such as transparency. | |
212 | ||
213 | ||
214 | Rendering | |
215 | --------- | |
216 | ||
217 | A terminal will display its Cells such that the screen will look as if | |
218 | it was rendered in the following pseudo-code manner: | |
219 | ||
220 | ``` | |
221 | for each layer Z, in descending order from maxZ to minZ: | |
222 | for each row Y, in ascending order from minY to maxY: | |
223 | for each column X, in ascending order from minX to maxX: | |
224 | draw tile at (X, Y, Z) | |
225 | advance X by tile width | |
226 | next column | |
227 | advance Y by 1 | |
228 | next row | |
229 | decrease Z by 1 | |
230 | next layer | |
231 | ``` | |
232 | ||
233 | A terminal is free to optimize its rendering as it sees fit, so long | |
234 | as the final screen output looks equivalent to the above method. | |
235 | ||
236 | ||
237 | ||
238 | Terminal State | |
239 | -------------- | |
240 | ||
241 | ||
242 | ||
243 | Terminal Reports | |
244 | ---------------- | |
245 | ||
246 | ||
247 | ||
248 | Error Handling | |
249 | -------------- | |
250 | ||
251 | ||
252 | ||
253 | Cursor Position | |
254 | --------------- | |
255 | ||
256 | ||
257 | ||
258 | ||
259 | Wire Formats | |
260 | ------------ | |
261 | ||
262 | ||
263 | ||
264 | ||
265 | Optimizations | |
266 | ------------- | |
267 | ||
268 | ||
269 | ||
270 | Examples | |
271 | -------- | |
272 | ||
273 |