must
[nikiroo-utils.git] / docs / images.md
CommitLineData
5a16a542
KL
1Next-Gen Multimedia Standard - Proposed Design Document
2=======================================================
3
4
5Purpose
6-------
7
8TODO: Crib text from the first message of
9https://gitlab.freedesktop.org/terminal-wg/specifications/issues/12 as
10to why people want images in their terminals.
11
12The same mechanism that can put raster-based images on the screen is
13easily generalizable to other media types such as vector-based images,
14animations, and embedded GUI widgets. This document is thus a
15"multimedia" proposal, not just "simple images".
16
17
18Acknowledgements
19----------------
20
21This proposal has been informed from the following prior work:
22
23* DEC VT300 series sixel graphics standard:
24 https://vt100.net/docs/vt3xx-gp/chapter14.html
25
26* iTerm2 image protocol:
27 https://iterm2.com/documentation-images.html
28
29* Kitty image protocol:
5172fe03 30 https://sw.kovidgoyal.net/kitty/graphics-protocol.html
5a16a542
KL
31
32* Jexer Terminal User Interface:
33 https://gitlab.com/klamonte/jexer
34
35
36Design Goals - Core
37-------------------
38
39The core ("must-have") design goals are:
40
41* Be easy to implement in existing terminals and applications:
42
43 - Sacrifice "10%" of potential function to eliminate "90%" of
44 implementation pain. "Less is more."
45
46 - Be a strict superset of the existing iTerm2 and DEC sixel image
47 solutions. One should be able to take an existing terminal or
48 application that emits/consumes iTerm2 or sixel sequences, and
49 only change the control sequence introducer/termination to achieve
50 the same effect as a terminal/application that conforms with this
51 standard.
52
53* Have no ambiguity. If two terminal or application developers can
54 read this document and reach different conclusions on what should be
55 on the screen, then an error exists in this document that must be
56 fixed.
57
895440f3 58 - Every feature must be straightforward to validate via automated
5a16a542
KL
59 unit testing.
60
895440f3
KL
61 - Every conformant terminal must produce the same output (pixels on
62 screen) given the same input (terminal font, terminal sequences).
5a16a542
KL
63
64 - Every option must have a defined default value.
65
66 - Erroneous sequences must have defined expected results.
67
68 - Every operation must act atomically: either everything worked
69 (image is on screen, cursor has moved, etc.) or nothing did.
70
71* Be straightforward to implement in non-"physical" terminals,
72 including:
73
74 - Future versions of terminal control libraries such as ncurses and
75 termbox.
76
77 - Terminal multiplexers that support "headless" terminals (no
78 physical screen) and "multi-head" terminals (many different
79 physical screens).
80
81* Be platform-agnostic, and easy to implement on (at the least):
82 POSIX, Windows, and web.
83
84 - All features must be available even if the only means of
85 communication between the application and terminal is control
86 sequences (e.g. no shared disk, no shared memory, no shared DOM,
87 etc.).
88
89* Support graceful fallback:
90
91 - Terminal emulators and physical terminals that do not support this
92 standard should remain usable with no undefined screen artifacts,
93 even when the application blindly emits these sequences to those
94 terminals.
95
96 - This standard must able to be versioned for future enhancements.
97
98 - An application must be able to detect that its terminal supports
99 this standard, and at what version.
100
101* Support secure programming practices:
102
103 - Applications must not be able to obtain unauthorized data from
104 terminal memory, such as: images emitted by other applications
105 still present in the terminal's scrollback buffer, terminal or
106 system memory limits.
107
108 - Applications must not be able to compromise the terminal through
109 denial-of-service such as: excessive memory usage, unterminated
110 control sequences. Similarly, terminals must not be able to
111 compromise application through their responses to application
112 queries.
113
114 - Applications must not be able to manipulate the terminal into
115 performing an insecure operation such as: reading arbitrary shared
116 memory regions, reading arbitrary files on disk, deleting
117 arbitrary files on disk, etc. Similarly, terminals must not be
118 able to manipulate applications into performing insecure
119 operations.
120
121 - This standard must be implementable when the terminal has a fixed
122 maximum memory, such as a kernel-level device driver.
123
124
125
126Design Goals - Secondary
127------------------------
128
5172fe03
KL
129The secondary ("nice-to-have") design goals are listed below. These
130might not all be possible, but will kept in mind:
5a16a542
KL
131
132* Minimal redundant network traffic for on-screen data that is
133 repeated: either on screen in multiple places, or in the same place
134 but refreshed multiple times.
135
136* Asynchronous notification from terminal to application that the
137 screen has been changed by outside or user action. Examples: font
138 change, session detach/attach, user changed image preferences.
139
5172fe03
KL
140* The ability for a multiplexer to "pass-thru" the image drawing
141 sequence to its "outer" terminal, with some support for limited
142 clipping.
143
144
5a16a542
KL
145
146Out Of Scope
147------------
148
149The following items are out of scope for this standard:
150
151* Bidirectional output. Applications are expected to generate Tiles
152 and place them on screen where they need. The cursor response to
153 image sequences are defined as left-to-right, consistent with
154 ECMA-48 / ANSI X3.64 sequences. An independent BIDI standard is
155 free to apply whatever solution will work for ECMA-48 sequences to
156 the sequences described in this document.
157
158* Capabilities. This standard defines a limited number of terminal
159 reports. These are not intended to be used as a general-purpose
160 capabilities model.
161
162
163
164Definitions
165-----------
166
167Terminal - The hardware, or a program that simulates hardware,
168 comprising a keyboard, screen, and mouse.
169
170Application - A program that utilizes the terminal for its
171 input/output with the user.
172
173Multiplexer - A special case of an application that simulates one or
174 more "inner" terminals for other applications to use,
175 and composes these inner terminals into a combined
176 screen to emit to one or more "outer" terminals that
177 obtain input/output from the user. Multiplexers are
178 thus both applications and terminals.
179
5172fe03
KL
180X - The column coordinate of a cell. This standard is 1-based (like
181 ECMA-48): the left-most column of the screen is numbered 1.
5a16a542 182
5172fe03
KL
183Y - The row coordinate of a cell. This standard is 1-based (like
184 ECMA-48): the top-most row of the screen is numbered 1.
5a16a542
KL
185
186Z - The layer that text or multimedia is placed on. This proposal
5172fe03 187 uses a right-hand coordinate system with (X, Y, Z) = (1, 1, 1)
5a16a542
KL
188 defined as the top-left corner on the default layer: positive Z
189 projects "away" from the user and "into" or "behind" the screen.
190 Rendering the Cells on the screen must produce the same result as
191 painter's algorithm (see Rendering section below).
192
193Cell - A fixed-width-and-height rectangle on the screen. The cells of
194 the screen are arranged in a grid of X columns and Y rows. A
195 Cell has dimensions of cellWidth and cellHeight, which can be
196 measured in either pixels or points. Every Cell has a
197 coordinate of (X, Y, Z).
198
199Tile - One or more contiguous Cells with data to be displayed. The
200 data can be text or image data, but not both. A Tile has width
201 of 1, 2, or more, and a coordinate of (X, Y, Z) that is the
202 same as its left-most (first) Cell's (X, Y, Z). In practice,
203 Tiles are typically one Cell wide for ASCII and Latin language
204 glyphs, and two Cells wide for "fullwidth" glyphs as used in
205 Asian langauges, emojis, and symbols. This standard does not
206 preclude Tiles from encompassing entire grapheme clusters.
207
208Layer - A screen-sized grid of Cells that have the same Z coordinate.
209 Layers are drawn to the screen in descending Z order. Layers
210 may have optional additional attributes such as transparency.
211
212
213Rendering
214---------
215
216A terminal will display its Cells such that the screen will look as if
217it was rendered in the following pseudo-code manner:
218
219```
220for each layer Z, in descending order from maxZ to minZ:
221 for each row Y, in ascending order from minY to maxY:
222 for each column X, in ascending order from minX to maxX:
223 draw tile at (X, Y, Z)
224 advance X by tile width
225 next column
226 advance Y by 1
227 next row
228 decrease Z by 1
229next layer
230```
231
232A terminal is free to optimize its rendering as it sees fit, so long
233as the final screen output looks equivalent to the above method.
234
235
236
237Terminal State
238--------------
239
240
241
242Terminal Reports
243----------------
244
245
246
247Error Handling
248--------------
249
250
251
252Cursor Position
253---------------
254
255
256
257
258Wire Formats
259------------
260
261
262
263
264Optimizations
265-------------
266
267
268
269Examples
270--------
271
272