update links
[fanfix.git] / docs / images.md
CommitLineData
5a16a542
KL
1Next-Gen Multimedia Standard - Proposed Design Document
2=======================================================
3
4
5Purpose
6-------
7
8TODO: Crib text from the first message of
9https://gitlab.freedesktop.org/terminal-wg/specifications/issues/12 as
10to why people want images in their terminals.
11
12The same mechanism that can put raster-based images on the screen is
13easily generalizable to other media types such as vector-based images,
14animations, and embedded GUI widgets. This document is thus a
15"multimedia" proposal, not just "simple images".
16
17
18Acknowledgements
19----------------
20
21This proposal has been informed from the following prior work:
22
23* DEC VT300 series sixel graphics standard:
24 https://vt100.net/docs/vt3xx-gp/chapter14.html
25
26* iTerm2 image protocol:
27 https://iterm2.com/documentation-images.html
28
29* Kitty image protocol:
5172fe03 30 https://sw.kovidgoyal.net/kitty/graphics-protocol.html
5a16a542
KL
31
32* Jexer Terminal User Interface:
33 https://gitlab.com/klamonte/jexer
34
35
36Design Goals - Core
37-------------------
38
39The core ("must-have") design goals are:
40
41* Be easy to implement in existing terminals and applications:
42
43 - Sacrifice "10%" of potential function to eliminate "90%" of
44 implementation pain. "Less is more."
45
46 - Be a strict superset of the existing iTerm2 and DEC sixel image
47 solutions. One should be able to take an existing terminal or
48 application that emits/consumes iTerm2 or sixel sequences, and
49 only change the control sequence introducer/termination to achieve
50 the same effect as a terminal/application that conforms with this
51 standard.
52
53* Have no ambiguity. If two terminal or application developers can
54 read this document and reach different conclusions on what should be
55 on the screen, then an error exists in this document that must be
56 fixed.
57
58 - Every feature should be straightforward to validate via automated
59 unit testing.
60
61 - Every conformant terminal should produce the same output (pixels
62 on screen) given the same input (terminal font, terminal
63 sequences).
64
65 - Every option must have a defined default value.
66
67 - Erroneous sequences must have defined expected results.
68
69 - Every operation must act atomically: either everything worked
70 (image is on screen, cursor has moved, etc.) or nothing did.
71
72* Be straightforward to implement in non-"physical" terminals,
73 including:
74
75 - Future versions of terminal control libraries such as ncurses and
76 termbox.
77
78 - Terminal multiplexers that support "headless" terminals (no
79 physical screen) and "multi-head" terminals (many different
80 physical screens).
81
82* Be platform-agnostic, and easy to implement on (at the least):
83 POSIX, Windows, and web.
84
85 - All features must be available even if the only means of
86 communication between the application and terminal is control
87 sequences (e.g. no shared disk, no shared memory, no shared DOM,
88 etc.).
89
90* Support graceful fallback:
91
92 - Terminal emulators and physical terminals that do not support this
93 standard should remain usable with no undefined screen artifacts,
94 even when the application blindly emits these sequences to those
95 terminals.
96
97 - This standard must able to be versioned for future enhancements.
98
99 - An application must be able to detect that its terminal supports
100 this standard, and at what version.
101
102* Support secure programming practices:
103
104 - Applications must not be able to obtain unauthorized data from
105 terminal memory, such as: images emitted by other applications
106 still present in the terminal's scrollback buffer, terminal or
107 system memory limits.
108
109 - Applications must not be able to compromise the terminal through
110 denial-of-service such as: excessive memory usage, unterminated
111 control sequences. Similarly, terminals must not be able to
112 compromise application through their responses to application
113 queries.
114
115 - Applications must not be able to manipulate the terminal into
116 performing an insecure operation such as: reading arbitrary shared
117 memory regions, reading arbitrary files on disk, deleting
118 arbitrary files on disk, etc. Similarly, terminals must not be
119 able to manipulate applications into performing insecure
120 operations.
121
122 - This standard must be implementable when the terminal has a fixed
123 maximum memory, such as a kernel-level device driver.
124
125
126
127Design Goals - Secondary
128------------------------
129
5172fe03
KL
130The secondary ("nice-to-have") design goals are listed below. These
131might not all be possible, but will kept in mind:
5a16a542
KL
132
133* Minimal redundant network traffic for on-screen data that is
134 repeated: either on screen in multiple places, or in the same place
135 but refreshed multiple times.
136
137* Asynchronous notification from terminal to application that the
138 screen has been changed by outside or user action. Examples: font
139 change, session detach/attach, user changed image preferences.
140
5172fe03
KL
141* The ability for a multiplexer to "pass-thru" the image drawing
142 sequence to its "outer" terminal, with some support for limited
143 clipping.
144
145
5a16a542
KL
146
147Out Of Scope
148------------
149
150The following items are out of scope for this standard:
151
152* Bidirectional output. Applications are expected to generate Tiles
153 and place them on screen where they need. The cursor response to
154 image sequences are defined as left-to-right, consistent with
155 ECMA-48 / ANSI X3.64 sequences. An independent BIDI standard is
156 free to apply whatever solution will work for ECMA-48 sequences to
157 the sequences described in this document.
158
159* Capabilities. This standard defines a limited number of terminal
160 reports. These are not intended to be used as a general-purpose
161 capabilities model.
162
163
164
165Definitions
166-----------
167
168Terminal - The hardware, or a program that simulates hardware,
169 comprising a keyboard, screen, and mouse.
170
171Application - A program that utilizes the terminal for its
172 input/output with the user.
173
174Multiplexer - A special case of an application that simulates one or
175 more "inner" terminals for other applications to use,
176 and composes these inner terminals into a combined
177 screen to emit to one or more "outer" terminals that
178 obtain input/output from the user. Multiplexers are
179 thus both applications and terminals.
180
5172fe03
KL
181X - The column coordinate of a cell. This standard is 1-based (like
182 ECMA-48): the left-most column of the screen is numbered 1.
5a16a542 183
5172fe03
KL
184Y - The row coordinate of a cell. This standard is 1-based (like
185 ECMA-48): the top-most row of the screen is numbered 1.
5a16a542
KL
186
187Z - The layer that text or multimedia is placed on. This proposal
5172fe03 188 uses a right-hand coordinate system with (X, Y, Z) = (1, 1, 1)
5a16a542
KL
189 defined as the top-left corner on the default layer: positive Z
190 projects "away" from the user and "into" or "behind" the screen.
191 Rendering the Cells on the screen must produce the same result as
192 painter's algorithm (see Rendering section below).
193
194Cell - A fixed-width-and-height rectangle on the screen. The cells of
195 the screen are arranged in a grid of X columns and Y rows. A
196 Cell has dimensions of cellWidth and cellHeight, which can be
197 measured in either pixels or points. Every Cell has a
198 coordinate of (X, Y, Z).
199
200Tile - One or more contiguous Cells with data to be displayed. The
201 data can be text or image data, but not both. A Tile has width
202 of 1, 2, or more, and a coordinate of (X, Y, Z) that is the
203 same as its left-most (first) Cell's (X, Y, Z). In practice,
204 Tiles are typically one Cell wide for ASCII and Latin language
205 glyphs, and two Cells wide for "fullwidth" glyphs as used in
206 Asian langauges, emojis, and symbols. This standard does not
207 preclude Tiles from encompassing entire grapheme clusters.
208
209Layer - A screen-sized grid of Cells that have the same Z coordinate.
210 Layers are drawn to the screen in descending Z order. Layers
211 may have optional additional attributes such as transparency.
212
213
214Rendering
215---------
216
217A terminal will display its Cells such that the screen will look as if
218it was rendered in the following pseudo-code manner:
219
220```
221for each layer Z, in descending order from maxZ to minZ:
222 for each row Y, in ascending order from minY to maxY:
223 for each column X, in ascending order from minX to maxX:
224 draw tile at (X, Y, Z)
225 advance X by tile width
226 next column
227 advance Y by 1
228 next row
229 decrease Z by 1
230next layer
231```
232
233A terminal is free to optimize its rendering as it sees fit, so long
234as the final screen output looks equivalent to the above method.
235
236
237
238Terminal State
239--------------
240
241
242
243Terminal Reports
244----------------
245
246
247
248Error Handling
249--------------
250
251
252
253Cursor Position
254---------------
255
256
257
258
259Wire Formats
260------------
261
262
263
264
265Optimizations
266-------------
267
268
269
270Examples
271--------
272
273