test/expected/LWN/0000763252

   1              LWN.NET WEEKLY EDITION FOR AUGUST 30, 2018
   2
   3
   4
   5   o Reference: 0000763252
   6   o News link: https://lwn.net/Articles/763252/
   7   o Source link:
   8
   9
  10     [1]Welcome  to  the  LWN.net Weekly Edition for August 30, 2018
  11     This edition contains the following feature content:
  12
  13     [2]An  introduction  to the Julia language, part 1 : Julia is a
  14     language  designed  for  intensive numerical calculations; this
  15     article gives an overview of its core features.
  16
  17     [3]C  considered  dangerous  :  a Linux Security Summit talk on
  18     what is being done to make the use of C in the kernel safer.
  19
  20     [4]The  second  half  of  the  4.19  merge  window  : the final
  21     features  merged (or not merged) before the merge window closed
  22     for this cycle.
  23
  24     [5]Measuring  (and fixing) I/O-controller throughput loss : the
  25     kernel's   I/O   controllers   can   provide  useful  bandwidth
  26     guarantees, but at a significant cost in throughput.
  27
  28     [6]KDE's  onboarding initiative, one year later : what has gone
  29     right  in  KDE's  effort  to make it easier for contributors to
  30     join the project, and what remains to be done.
  31
  32     [7]Sharing  and  archiving  data  sets with Dat : an innovative
  33     approach to addressing and sharing data on the net.
  34
  35     This week's edition also includes these inner pages:
  36
  37     [8]Brief   items   :  Brief  news  items  from  throughout  the
  38     community.
  39
  40     [9]Announcements  : Newsletters, conferences, security updates,
  41     patches, and more.
  42
  43     Please  enjoy  this  week's  edition, and, as always, thank you
  44     for supporting LWN.net.
  45
  46     [10]Comments (none posted)
  47
  48     [11]An introduction to the Julia language, part 1
  49
  50     August 28, 2018
  51
  52     This article was contributed by Lee Phillips
  53
  54     [12]Julia  is  a  young  computer language aimed at serving the
  55     needs  of  scientists,  engineers,  and  other practitioners of
  56     numerically   intensive  programming.  It  was  first  publicly
  57     released   in   2012.  After  an  intense  period  of  language
  58     development,  version 1.0 was [13]released on August 8. The 1.0
  59     release  promises  years  of  language  stability; users can be
  60     confident  that  developments  in the 1.x series will not break
  61     their  code.  This  is  the  first  part  of a two-part article
  62     introducing  the  world  of  Julia.  This  part  will introduce
  63     enough  of  the  language syntax and constructs to allow you to
  64     begin  to write simple programs. The following installment will
  65     acquaint  you  with the additional pieces needed to create real
  66     projects, and to make use of Julia's ecosystem.
  67
  68     Goals and history
  69
  70     The  Julia  project  has ambitious goals. It wants the language
  71     to  perform  about  as  well  as  Fortran  or  C  when  running
  72     numerical  algorithms,  while  remaining as pleasant to program
  73     in  as Python. I believe the project has met these goals and is
  74     poised  to  see  increasing  adoption by numerical researchers,
  75     especially now that an official, stable release is available.
  76
  77     The  Julia  project  maintains  a [14]micro-benchmark page that
  78     compares  its  numerical  performance  against  both statically
  79     compiled   languages   (C,   Fortran)   and  dynamically  typed
  80     languages  (R,  Python). While it's certainly possible to argue
  81     about  the relevance and fairness of particular benchmarks, the
  82     data  overall  supports  the Julia team's contention that Julia
  83     has   generally   achieved  parity  with  Fortran  and  C;  the
  84     benchmark source code is available.
  85
  86     Julia  began  as  research  in  computer  science  at  MIT; its
  87     creators  are  Alan  Edelman,  Stefan Karpinski, Jeff Bezanson,
  88     and  Viral  Shah.  These  four  remain active developers of the
  89     language.  They, along with Keno Fischer, co-founder and CTO of
  90     [15]Julia  Computing , were kind enough to share their thoughts
  91     with  us  about the language. I'll be drawing on their comments
  92     later  on;  for now, let's get a taste of what Julia code looks
  93     like.
  94
  95     Getting started
  96
  97     To   explore   Julia   initially,   start   up   its   standard
  98     [16]read-eval-print   loop   (REPL)  by  typing  julia  at  the
  99     terminal,  assuming  that  you have installed it. You will then
 100     be  able  to  interact with what will seem to be an interpreted
 101     language  —  but,  behind  the scenes, those commands are being
 102     compiled  by  a  just-in-time  (JIT)  compiler  that  uses  the
 103     [17]LLVM   compiler   framework  .  This  allows  Julia  to  be
 104     interactive,  while  turning the code into fast, native machine
 105     instructions.   However,  the  JIT  compiler  passes  sometimes
 106     introduce  noticeable delays at the REPL, especially when using
 107     a function for the first time.
 108
 109     To  run  a  Julia  program non-interactively, execute a command
 110     like: $ julia script.jl
 111
 112     Julia  has  all  the  usual data structures: numbers of various
 113     types     (including    complex    and    rational    numbers),
 114     multidimensional    arrays,    dictionaries,    strings,    and
 115     characters.  Functions  are  first-class: they can be passed as
 116     arguments  to other functions, can be members of arrays, and so
 117     on.
 118
 119     Julia  embraces  Unicode. Strings, which are enclosed in double
 120     quotes,  are  arrays  of Unicode characters, which are enclosed
 121     in  single  quotes.  The  " * " operator is used for string and
 122     character  concatenation.  Thus 'a' and 'β' are characters, and
 123     'aβ'  is  a syntax error. "a" and "β" are strings, as are "aβ",
 124     'a' * 'β', and "a" * "β" — all evaluate to the same string.
 125
 126     Variable  and  function names can contain non-ASCII characters.
 127     This,   along  with  Julia's  clever  syntax  that  understands
 128     numbers  prepended  to variables to mean multiplication, goes a
 129     long  way  to  allowing  the  numerical scientist to write code
 130     that  more  closely resembles the compact mathematical notation
 131     of the equations that usually lie behind it.  julia ε₁ = 0.01
 132
 133     0.01
 134
 135     julia ε₂ = 0.02
 136
 137     0.02
 138
 139     julia 2ε₁ + 3ε₂
 140
 141     0.08
 142
 143     And  where  does  Julia come down on the age-old debate of what
 144     do  about  1/2  ? In Fortran and Python 2, this will get you 0,
 145     since  1  and 2 are integers, and the result is rounded down to
 146     the  integer  0. This was deemed inconsistent, and confusing to
 147     some,  so  it  was changed in Python 3 to return 0.5 — which is
 148     what you get in Julia, too.
 149
 150     While  we're  on  the  subject  of  fractions, Julia can handle
 151     rational  numbers,  with  a special syntax: 3//5 + 2//3 returns
 152     19//15  ,  while  3/5  + 2/3 gets you the floating-point answer
 153     1.2666666666666666.  Internally,  Julia  thinks  of  a rational
 154     number  in  its  reduced  form,  so the expression 6//8 == 3//4
 155     returns true , and numerator(6//8) returns 3 .
 156
 157     Arrays
 158
 159     Arrays  are  enclosed  in  square  brackets and indexed with an
 160     iterator  that  can  contain a step value:  julia a = [1, 2, 3,
 161     4, 5, 6]
 162
 163     6-element Array{Int64,1}:
 164
 165     1
 166
 167     2
 168
 169     3
 170
 171     4
 172
 173     5
 174
 175     6
 176
 177     julia a[1:2:end]
 178
 179     3-element Array{Int64,1}:
 180
 181     1
 182
 183     3
 184
 185     5
 186
 187     As  you  can  see,  indexing  starts at one, and the useful end
 188     index  means  the  obvious thing. When you define a variable in
 189     the  REPL,  Julia  replies  with  the  type  and  value  of the
 190     assigned  data;  you  can  suppress  this output by ending your
 191     input line with a semicolon.
 192
 193     Since  arrays  are  such a vital part of numerical computation,
 194     and  Julia makes them easy to work with, we'll spend a bit more
 195     time with them than the other data structures.
 196
 197     To  illustrate  the  syntax,  we  can start with a couple of 2D
 198     arrays, defined at the REPL:  julia a = [1 2 3; 4 5 6]
 199
 200     2×3 Array{Int64,2}:
 201
 202     1 2 3
 203
 204     4 5 6
 205
 206     julia z = [-1 -2 -3; -4 -5 -6];
 207
 208     Indexing is as expected:  julia a[1, 2]
 209
 210     2
 211
 212     You can glue arrays together horizontally:  julia [a z]
 213
 214     2×6 Array{Int64,2}:
 215
 216     1 2 3 -1 -2 -3
 217
 218     4 5 6 -4 -5 -6
 219
 220     And vertically:  julia [a; z]
 221
 222     4×3 Array{Int64,2}:
 223
 224     1  2  3
 225
 226     4  5  6
 227
 228     -1 -2 -3
 229
 230     -4 -5 -6
 231
 232     Julia  has  all  the  usual  operators for handling arrays, and
 233     [18]linear  algebra  functions  that  work  with  matrices  (2D
 234     arrays).  The  linear  algebra  functions  are  part of Julia's
 235     standard  library,  but need to be imported with a command like
 236     "  using  LinearAlgebra  ",  which is a detail omitted from the
 237     current  documentation.  The  functions  include such things as
 238     determinants,  matrix  inverses,  eigenvalues and eigenvectors,
 239     many  kinds  of  matrix  factorizations,  etc.  Julia  has  not
 240     reinvented  the  wheel  here,  but  wisely  uses the [19]LAPACK
 241     Fortran library of battle-tested linear algebra routines.
 242
 243     The  extension  of  arithmetic  operators  to arrays is usually
 244     intuitive:  julia a + z
 245
 246     2×3 Array{Int64,2}:
 247
 248     0 0 0
 249
 250     0 0 0
 251
 252     And  the  numerical  prepending  syntax works with arrays, too:
 253     julia 3a + 4z
 254
 255     2×3 Array{Int64,2}:
 256
 257     -1 -2 -3
 258
 259     -4 -5 -6
 260
 261     Putting  a  multiplication  operator  between two matrices gets
 262     you matrix multiplication:  julia a * transpose(a)
 263
 264     2×2 Array{Int64,2}:
 265
 266     14 32
 267
 268     32 77
 269
 270     You  can  "broadcast"  numbers  to cover all the elements in an
 271     array  by prepending the usual arithmetic operators with a dot:
 272     julia 1 .+ a
 273
 274     2×3 Array{Int64,2}:
 275
 276     2 3 4
 277
 278     5 6 7
 279
 280     Note  that the language only actually requires the dot for some
 281     operators,  but  not  for  others,  such  as  "*"  and "/". The
 282     reasons  for this are arcane, and it probably makes sense to be
 283     consistent  and  use  the dot whenever you intend broadcasting.
 284     Note   also   that   the   current   version  of  the  official
 285     documentation  is  incorrect  in claiming that you may omit the
 286     dot from "+" and "-"; in fact, this now gives an error.
 287
 288     You  can  use  the  dot  notation to turn any function into one
 289     that   operates   on   each   element   of  an  array:    julia
 290     round.(sin.([0, π/2, π, 3π/2, 2π]))
 291
 292     5-element Array{Float64,1}:
 293
 294     0.0
 295
 296     1.0
 297
 298     0.0
 299
 300     -1.0
 301
 302     -0.0
 303
 304     The  example  above  illustrates  chaining two dotted functions
 305     together.  The  Julia compiler turns expressions like this into
 306     "fused"  operations:  instead of applying each function in turn
 307     to  create a new array that is passed to the next function, the
 308     compiler   combines   the  functions  into  a  single  compound
 309     function  that  is  applied  once  over  the  array, creating a
 310     significant optimization.
 311
 312     You  can  use  this  dot  notation with any function, including
 313     your  own, to turn it into a version that operates element-wise
 314     over arrays.
 315
 316     Dictionaries  (associative  arrays) can be defined with several
 317     syntaxes. Here's one:  julia d1 = Dict("A"=1, "B"=2)
 318
 319     Dict{String,Int64} with 2 entries:
 320
 321     "B" = 2
 322
 323     "A" = 1
 324
 325     You  may  have  noticed  that the code snippets so far have not
 326     included  any  type  declarations.  Every  value in Julia has a
 327     type,  but  the  compiler  will  infer  types  if  they are not
 328     specified.  It  is generally not necessary to declare types for
 329     performance,   but  type  declarations  sometimes  serve  other
 330     purposes,  that  we'll  return  to  later. Julia has a deep and
 331     sophisticated  type  system,  including  user-defined types and
 332     C-like  structs. Types can have behaviors associated with them,
 333     and  can  inherit  behaviors  from  other types. The best thing
 334     about  Julia's  type system is that you can ignore it entirely,
 335     use  just  a  few  pieces  of  it,  or spend weeks studying its
 336     design.
 337
 338     Control flow
 339
 340     Julia  code  is organized in blocks, which can indicate control
 341     flow,  function  definitions,  and other code units. Blocks are
 342     terminated  with  the  end  keyword,  and  indentation  is  not
 343     significant.  Statements  are separated either with newlines or
 344     semicolons.
 345
 346     Julia  has the typical control flow constructs; here is a while
 347     block:  julia i = 1;
 348
 349     julia while i 5
 350
 351     print(i)
 352
 353     global i = i + 1
 354
 355     end
 356
 357     1234
 358
 359     Notice  the  global  keyword.  Most blocks in Julia introduce a
 360     local  scope for variables; without this keyword here, we would
 361     get an error about an undefined variable.
 362
 363     Julia  has  the  usual if statements and for loops that use the
 364     same  iterators that we introduced above for array indexing. We
 365     can  also  iterate  over collections:  julia for i ∈ ['a', 'b',
 366     'c']
 367
 368     println(i)
 369
 370     end
 371
 372     a
 373
 374     b
 375
 376     c
 377
 378     In  place of the fancy math symbol in this for loop, we can use
 379     "  =  "  or " in ". If you want to use the math symbol but have
 380     no  convenient  way  to type it, the REPL will help you: type "
 381     \in  "  and  the  TAB key, and the symbol appears; you can type
 382     many [20]LaTeX expressions into the REPL in this way.
 383
 384     Development of Julia
 385
 386     The   language   is   developed   on   GitHub,  with  over  700
 387     contributors.  The  Julia  team  mentioned in their email to us
 388     that  the decision to use GitHub has been particularly good for
 389     Julia,  as  it  streamlined  the  process  for  many  of  their
 390     contributors,  who  are scientists or domain experts in various
 391     fields, rather than professional software developers.
 392
 393     The  creators  of  Julia  have  [21]published  [PDF] a detailed
 394     “mission  statement”  for  the  language, describing their aims
 395     and  motivations.  A  key issue that they wanted their language
 396     to  solve  is what they called the "two-language problem." This
 397     situation  is familiar to anyone who has used Python or another
 398     dynamic  language on a demanding numerical problem. To get good
 399     performance,   you  will  wind  up  rewriting  the  numerically
 400     intensive  parts  of  the program in C or Fortran, dealing with
 401     the  interface  between  the  two  languages,  and may still be
 402     disappointed  in  the overhead presented by calling the foreign
 403     routines from your original code.
 404
 405     For  Python,  [22]NumPy and SciPy wrap many numerical routines,
 406     written  in Fortran or C, for efficient use from that language,
 407     but  you  can  only  take advantage of this if your calculation
 408     fits  the  pattern  of  an  available  routine; in more general
 409     cases,  where you will have to write a loop over your data, you
 410     are  stuck with Python's native performance, which is orders of
 411     magnitude  slower.  If  you  switch  to  an alternative, faster
 412     implementation  of  Python,  such  as  [23]PyPy , the numerical
 413     libraries  may  not  be  compatible; NumPy became available for
 414     PyPy only within about the past year.
 415
 416     Julia  solves  the  two-language problem by being as expressive
 417     and  simple  to  program  in  as  a dynamic scripting language,
 418     while  having  the  native  performance  of  a static, compiled
 419     language.  There  is  no need to write numerical libraries in a
 420     second  language,  but  C  or  Fortran  library routines can be
 421     called   using  a  facility  that  Julia  has  built-in.  Other
 422     languages,  such as [24]Python or [25]R , can also interoperate
 423     easily with Julia using external packages.
 424
 425     Documentation
 426
 427     There  are  many  resources  to  turn to to learn the language.
 428     There   is  an  extensive  and  detailed  [26]manual  at  Julia
 429     headquarters,  and  this may be a good place to start. However,
 430     although  the first few chapters provide a gentle introduction,
 431     the  material soon becomes dense and, at times, hard to follow,
 432     with  references to concepts that are not explained until later
 433     chapters.  Fortunately,  there  is a [27]"learning" link at the
 434     top  of  the Julia home page, which takes you to a long list of
 435     videos,  tutorials,  books,  articles,  and  classes both about
 436     Julia  and that use Julia in teaching subjects such a numerical
 437     analysis.  There  is also a fairly good [28]cheat-sheet [PDF] ,
 438     which was just updated for v. 1.0.
 439
 440     If  you're  coming  from  Python,  [29]this  list of noteworthy
 441     differences  between  Python  and Julia syntax will probably be
 442     useful.
 443
 444     Some  of  the  linked  tutorials are in the form of [30]Jupyter
 445     notebooks  — indeed, the name "Jupyter" is formed from "Julia",
 446     "Python",  and  "R",  which  are  the  three original languages
 447     supported  by  the  interface. The [31]Julia kernel for Jupyter
 448     was  recently upgraded to support v. 1.0. Judicious sampling of
 449     a  variety  of  documentation  sources,  combined  with liberal
 450     experimentation,  may be the best way of learning the language.
 451     Jupyter  makes this experimentation more inviting for those who
 452     enjoy  the  web-based  interface,  but the REPL that comes with
 453     Julia  helps  a  great  deal  in  this regard by providing, for
 454     instance,  TAB  completion and an extensive help system invoked
 455     by simply pressing the "?" key.
 456
 457     Stay tuned
 458
 459     The  [32]next  installment in this two-part series will explain
 460     how   Julia  is  organized  around  the  concept  of  "multiple
 461     dispatch".  You  will  learn  how  to create functions and make
 462     elementary  use  of  Julia's  type  system.  We'll  see  how to
 463     install  packages  and  use  modules,  and  how to make graphs.
 464     Finally,  Part  2  will  briefly survey the important topics of
 465     macros and distributed computing.
 466
 467     [33]Comments (80 posted)
 468
 469     [34]C considered dangerous
 470
 471     By Jake Edge
 472
 473     August 29, 2018
 474
 475     [35]LSS NA
 476
 477     At  the  North  America  edition of the [36]2018 Linux Security
 478     Summit  (LSS  NA),  which was held in late August in Vancouver,
 479     Canada,  Kees  Cook  gave a presentation on some of the dangers
 480     that  come  with  programs  written  in  C.  In  particular, of
 481     course,  the  Linux  kernel is mostly written in C, which means
 482     that  the security of our systems rests on a somewhat dangerous
 483     foundation.  But there are things that can be done to help firm
 484     things  up  by  " Making C Less Dangerous " as the title of his
 485     talk suggested.
 486
 487     He  began  with  a brief summary of the work that he and others
 488     are  doing  as  part  of the [37]Kernel Self Protection Project
 489     (KSPP).  The  goal  of the project is to get kernel protections
 490     merged  into  the  mainline. These protections are not targeted
 491     at  protecting user-space processes from other (possibly rogue)
 492     processes,  but  are, instead, focused on protecting the kernel
 493     from  user-space  code.  There  are around 12 organizations and
 494     ten  individuals  working  on roughly 20 different technologies
 495     as  part  of the KSPP, he said. The progress has been "slow and
 496     steady", he said, which is how he thinks it should go.  [38]
 497
 498     One  of  the  main  problems is that C is treated mostly like a
 499     fancy  assembler.  The  kernel  developers do this because they
 500     want  the  kernel to be as fast and as small as possible. There
 501     are   other   reasons,   too,   such   as   the   need   to  do
 502     architecture-specific  tasks that lack a C API (e.g. setting up
 503     page tables, switching to 64-bit mode).
 504
 505     But   there   is   lots   of  undefined  behavior  in  C.  This
 506     "operational   baggage"   can  lead  to  various  problems.  In
 507     addition,  C  has a weak standard library with multiple utility
 508     functions  that  have  various  pitfalls.  In C, the content of
 509     uninitialized  automatic  variables  is  undefined,  but in the
 510     machine  code that it gets translated to, the value is whatever
 511     happened  to  be  in  that  memory  location  before.  In  C, a
 512     function  pointer can be called even if the type of the pointer
 513     does  not  match the type of the function being called—assembly
 514     doesn't care, it just jumps to a location, he said.
 515
 516     The  APIs  in  the standard library are also bad in many cases.
 517     He  asked:  why is there no argument to memcpy() to specify the
 518     maximum  destination  length?  He  noted a recent [39]blog post
 519     from  Raph  Levien  entitled "With Undefined Behavior, Anything
 520     is  Possible".  That  obviously  resonated  with  Cook,  as  he
 521     pointed  out  his  T-shirt—with  the title and artwork from the
 522     post.
 523
 524     Less danger
 525
 526     He  then  moved on to some things that kernel developers can do
 527     (and  are  doing) to get away from some of the dangers of C. He
 528     began  with variable-length arrays (VLAs), which can be used to
 529     overflow  the  stack to access data outside of its region. Even
 530     if  the  stack  has a guard page, VLAs can be used to jump past
 531     it  to  write into other memory, which can then be used by some
 532     other  kind  of  attack. The C language is "perfectly fine with
 533     this".  It  is  easy  to find uses of VLAs with the -Wvla flag,
 534     however.
 535
 536     But  it  turns  out  that  VLAs  are  [40]not  just  bad from a
 537     security   perspective   ,   they   are   also   slow.   In   a
 538     micro-benchmark  associated with a [41]patch removing a VLA , a
 539     13%  performance  boost  came from using a fixed-size array. He
 540     dug  in  a  bit  further and found that much more code is being
 541     generated  to  handle a VLA, which explains the speed increase.
 542     Since  Linus  Torvalds  has  [42]declared  that  VLAs should be
 543     removed  from  the  kernel because they cause security problems
 544     and also slow the kernel down; Cook said "don't use VLAs".
 545
 546     Another  problem area is switch statements, in particular where
 547     there  is  no  break  for  a  case  .  That could mean that the
 548     programmer  expects  and wants to fall through to the next case
 549     or  it could be that the break was simply forgotten. There is a
 550     way  to  get a warning from the compiler for fall-throughs, but
 551     there  needs  to be a way to mark those that are truly meant to
 552     be  that way. A special fall-through "statement" in the form of
 553     a   comment   is   what   has   been   agreed   on  within  the
 554     static-analysis  community.  He  and  others  have  been  going
 555     through  each  of  the  places  where  there is no break to add
 556     these  comments  (or  a break ); they have "found a lot of bugs
 557     this way", he said.
 558
 559     Uninitialized  local variables will generate a warning, but not
 560     if  the  variable is passed in by reference. There are some GCC
 561     plugins  that  will  automatically  initialize these variables,
 562     but  there are also patches for both GCC and Clang to provide a
 563     compiler  option  to  do  so. Neither of those is upstream yet,
 564     but  Torvalds has praised the effort so the kernel would likely
 565     use  the  option.  An  interesting  side effect that came about
 566     while   investigating   this   was   a  warning  he  got  about
 567     unreachable  code  when  he  enabled  the  auto-initialization.
 568     There  were  two  variables  declared  just after a switch (and
 569     outside of any case ), where they would never be reached.
 570
 571     Arithmetic  overflow  is  another  undefined behavior in C that
 572     can  cause various problems. GCC can check for signed overflow,
 573     which  performs  well  (the overhead is in the noise, he said),
 574     but  adding warning messages for it does grow the kernel by 6%;
 575     making  the  overflow abort, instead, only adds 0.1%. Clang can
 576     check  for  both  signed and unsigned overflow; signed overflow
 577     is  undefined,  while  unsigned  overflow is defined, but often
 578     unexpected.  Marking places where unsigned overflow is expected
 579     is  needed;  it would be nice to get those annotations put into
 580     the kernel, Cook said.
 581
 582     Explicit   bounds   checking   is   expensive.   Doing  it  for
 583     copy_{to,from}_user()  is  a  less than 1% performance hit, but
 584     adding  it  to  the strcpy() and memcpy() families are around a
 585     2%  hit. Pre-Meltdown that would have been a totally impossible
 586     performance  regression  for  security, he said; post-Meltdown,
 587     since  it  is less than 5%, maybe there is a chance to add this
 588     checking.
 589
 590     Better  APIs would help as well. He pointed to the evolution of
 591     strcpy()  ,  through  str  n  cpy()  and str l cpy() (each with
 592     their  own bounds flaws) to str s cpy() , which seems to be "OK
 593     so  far".  He  also mentioned memcpy() again as a poor API with
 594     respect to bounds checking.
 595
 596     Hardware  support  for  bounds  checking  is  available  in the
 597     application  data  integrity  (ADI)  feature  for  SPARC and is
 598     coming  for  Arm; it may also be available for Intel processors
 599     at  some point. These all use a form of "memory tagging", where
 600     allocations  get a tag that is stored in the high-order byte of
 601     the  address.  An offset from the address can be checked by the
 602     hardware  to  see if it still falls within the allocated region
 603     based on the tag.
 604
 605     Control-flow  integrity  (CFI)  has  become  more  of  an issue
 606     lately  because much of what attackers had used in the past has
 607     been  marked  as  "no  execute"  so  they  are turning to using
 608     existing  code  "gadgets"  already  present  in  the  kernel by
 609     hijacking  existing indirect function calls. In C, you can just
 610     call  pointers  without  regard  to  the type as it just treats
 611     them  as  an  address  to  jump  to.  Clang  has a CFI-sanitize
 612     feature  that  enforces  the function prototype to restrict the
 613     calls  that  can  be  made.  It  is  done at runtime and is not
 614     perfect,  in  part  because  there are lots of functions in the
 615     kernel  that  take  one  unsigned  long parameter and return an
 616     unsigned long.
 617
 618     Attacks  on  CFI  have both a "forward edge", which is what CFI
 619     sanitize  tries  to  handle,  and  a "backward edge" that comes
 620     from  manipulating  the  stack  values,  the  return address in
 621     particular.  Clang  has  two  methods  available to prevent the
 622     stack  manipulation.  The first is the "safe stack", which puts
 623     various   important  items  (e.g.  "safe"  variables,  register
 624     spills,   and   the   return  address)  on  a  separate  stack.
 625     Alternatively,  the  "shadow  stack" feature creates a separate
 626     stack just for return addresses.
 627
 628     One  problem  with  these  other  stacks is that they are still
 629     writable,  so  if an attacker can find them in memory, they can
 630     still  perform  their attacks. Hardware-based protections, like
 631     Intel's     Control-Flow    Enforcement    Technology    (CET),
 632     [43]provides   a   read-only   shadow  call  stack  for  return
 633     addresses.   Another   hardware   protection   is   [44]pointer
 634     authentication  for  Arm, which adds a kind of encrypted tag to
 635     the return address that can be verified before it is used.
 636
 637     Status and challenges
 638
 639     Cook  then  went  through  the current status of handling these
 640     different  problems  in  the kernel. VLAs are almost completely
 641     gone,  he  said,  just a few remain in the crypto subsystem; he
 642     hopes  those  VLAs will be gone by 4.20 (or whatever the number
 643     of  the  next  kernel  release  turns  out  to  be).  Once that
 644     happens,  he  plans  to  turn  on -Wvla for the kernel build so
 645     that none creep back in.
 646
 647     There  has  been  steady  progress made on marking fall-through
 648     cases  in  switch  statements. Only 745 remain to be handled of
 649     the  2311  that  existed  when  this  work  started;  each  one
 650     requires  scrutiny  to  determine  what the author's intent is.
 651     Auto-initialized  local  variables  can  be done using compiler
 652     plugins,  but  that  is "not quite what we want", he said. More
 653     compiler   support  would  be  helpful  there.  For  arithmetic
 654     overflow,  it  would  be  nice  to  see GCC get support for the
 655     unsigned  case,  but  memory allocations are now doing explicit
 656     overflow checking at this point.
 657
 658     Bounds  checking has seen some "crying about performance hits",
 659     so  we  are  waiting impatiently for hardware support, he said.
 660     CFI  forward-edge  protection  needs [45]link-time optimization
 661     (LTO)  support  for  Clang  in  the kernel, but it is currently
 662     working  on  Android.  For  backward-edge mitigation, the Clang
 663     shadow   call   stack   is  working  on  Android,  but  we  are
 664     impatiently waiting for hardware support for that too.
 665
 666     There  are a number of challenges in doing security development
 667     for  the  kernel,  Cook said. There are cultural boundaries due
 668     to  conservatism  within  the  kernel  community; that requires
 669     patiently  working  and reworking features in order to get them
 670     upstream.  There  are,  of course, technical challenges because
 671     of  the complexity of security changes; those kinds of problems
 672     can  be solved. There are also resource limitations in terms of
 673     developers,  testers,  reviewers, and so on. KSPP and the other
 674     kernel  security  developers  are  still  making that "slow but
 675     steady" progress.
 676
 677     Cook's  [46]slides  [PDF] are available for interested readers;
 678     before  long,  there should be a video available of the talk as
 679     well.
 680
 681     [I  would  like  to  thank  LWN's  travel  sponsor,  the  Linux
 682     Foundation,  for travel assistance to attend the Linux Security
 683     Summit in Vancouver.]
 684
 685     [47]Comments (70 posted)
 686
 687     [48]The second half of the 4.19 merge window
 688
 689     By Jonathan Corbet
 690
 691     August  26,  2018    By  the  time  Linus Torvalds [49]released
 692     4.19-rc1  and  closed  the  merge  window  for this development
 693     cycle,  12,317  non-merge  changesets  had found their way into
 694     the  mainline;  about  4,800  of  those  landed  after [50]last
 695     week's  summary  was  written.  As tends to be the case late in
 696     the  merge  window,  many  of  those changes were fixes for the
 697     bigger  patches  that  went  in  early,  but  there were also a
 698     number  of  new  features  added.  Some of the more significant
 699     changes include:
 700
 701     Core kernel
 702
 703     The  full  set of patches adding [51]control-group awareness to
 704     the  out-of-memory  killer  has  not been merged due to ongoing
 705     disagreements,  but  one  piece  of  it  has:  there  is  a new
 706     memory.oom.group  control  knob  that  will cause all processes
 707     within  a  control  group  to  be  killed  in  an out-of-memory
 708     situation.
 709
 710     A  new set of protections has been added to prevent an attacker
 711     from  fooling  a  program  into  writing to an existing file or
 712     FIFO.  An  open  with  the  O_CREAT flag to a file or FIFO in a
 713     world-writable,  sticky directory (e.g. /tmp ) will fail if the
 714     owner  of  the  opening  process is not the owner of either the
 715     target   file  or  the  containing  directory.  This  behavior,
 716     disabled    by    default,    is    controlled   by   the   new
 717     protected_regular and protected_fifos sysctl knobs.
 718
 719     Filesystems and block layer
 720
 721     The  dm-integrity  device-mapper  target can now use a separate
 722     device for metadata storage.
 723
 724     EROFS,  the  "enhanced read-only filesystem", has been added to
 725     the  staging  tree. It is " a lightweight read-only file system
 726     with    modern   designs   (eg.   page-sized   blocks,   inline
 727     xattrs/data,  etc.)  for  scenarios which need high-performance
 728     read-only  requirements,  eg.  firmwares  in  mobile  phone  or
 729     LIVECDs "
 730
 731     The  new  "metadata  copy-up"  feature  in overlayfs will avoid
 732     copying   a   file's   contents   to   the  upper  layer  on  a
 733     metadata-only change. See [52]this commit for details.
 734
 735     Hardware support
 736
 737     Graphics : Qualcomm Adreno A6xx GPUs.
 738
 739     Industrial    I/O    :    Spreadtrum    SC27xx    series   PMIC
 740     analog-to-digital    converters,    Analog    Devices    AD5758
 741     digital-to-analog  converters, Intersil ISL29501 time-of-flight
 742     sensors,  Silicon  Labs  SI1133  UV  index/ambient light sensor
 743     chips, and Bosch Sensortec BME680 sensors.
 744
 745     Miscellaneous   :  Generic  ADC-based  resistive  touchscreens,
 746     Generic  ASIC  devices  via  the  Google [53]Gasket framework ,
 747     Analog  Devices  ADGS1408/ADGS1409  multiplexers,  Actions Semi
 748     Owl  SoCs  DMA  controllers,  MEN  16Z069 watchdog timers, Rohm
 749     BU21029   touchscreen   controllers,   Cirrus   Logic  CS47L35,
 750     CS47L85,  CS47L90,  and  CS47L91  codecs,  Cougar  500k  gaming
 751     keyboards,   Qualcomm   GENI-based   I2C  controllers,  Actions
 752     Semiconductor  Owl  I2C  controllers,  ChromeOS  EC-based USBPD
 753     chargers, and Analog Devices ADP5061 battery chargers.
 754
 755     USB  :  Nuvoton  NPCM7XX on-chip EHCI USB controllers, Broadcom
 756     Stingray PCIe PHYs, and Renesas R-Car generation 3 PCIe PHYs.
 757
 758     There  is  also  a  new  subsystem  for the abstraction of GNSS
 759     (global  navigation  satellite  systems  —  GPS,  for  example)
 760     receivers  in  the  kernel.  To  date,  such  devices have been
 761     handled  with  an  abundance of user-space drivers; the hope is
 762     to  bring  some  order  in  this  area.  Support for u-blox and
 763     SiRFstar receivers has been added as well.
 764
 765     Kernel internal
 766
 767     The  __deprecated  marker,  used to mark interfaces that should
 768     no  longer  be  used,  has been deprecated and removed from the
 769     kernel  entirely.  [54]Torvalds  said  : " They are not useful.
 770     They  annoy  everybody,  and  nobody  ever  does anything about
 771     them,  because  it's  always 'somebody elses problem'. And when
 772     people  start  thinking  that  warnings  are  normal, they stop
 773     looking  at  them, and the real warnings that mean something go
 774     unnoticed. "
 775
 776     The  minimum  version  of  GCC  required by the kernel has been
 777     moved up to 4.6.
 778
 779     There  are  a  couple of significant changes that failed to get
 780     in  this  time around, including the [55]XArray data structure.
 781     The  patches are thought to be ready, but they had the bad luck
 782     to  be  based  on  a  tree  that  failed to be merged for other
 783     reasons,  so  Torvalds  [56]didn't even look at them . That, in
 784     turn,   blocks  another  set  of  patches  intended  to  enable
 785     migration of slab-allocated objects.
 786
 787     The  other  big  deferral  is  the  [57]new system-call API for
 788     filesystem  mounting  . Despite ongoing [58]concerns about what
 789     happens  when  the  same  low-level  device is mounted multiple
 790     times  with  conflicting  options,  Al  Viro  sent  [59]a  pull
 791     request  to  send  this  work  upstream. The ensuing discussion
 792     made  it  clear  that  there  is  still not a consensus in this
 793     area,  though,  so  it  seems  that  this  work has to wait for
 794     another cycle.
 795
 796     Assuming  all  goes  well,  the  kernel will stabilize over the
 797     coming  weeks  and  the  final  4.19  release  will  happen  in
 798     mid-October.
 799
 800     [60]Comments (1 posted)
 801
 802     [61]Measuring (and fixing) I/O-controller throughput loss
 803
 804     August 29, 2018
 805
 806     This article was contributed by Paolo Valente
 807
 808     Many  services,  from  web hosting and video streaming to cloud
 809     storage,  need  to  move  data  to  and from storage. They also
 810     often  require  that  each  per-client I/O flow be guaranteed a
 811     non-zero   amount  of  bandwidth  and  a  bounded  latency.  An
 812     expensive  way to provide these guarantees is to over-provision
 813     storage  resources,  keeping  each  resource underutilized, and
 814     thus  have  plenty of bandwidth available for the few I/O flows
 815     dispatched  to  each  medium.  Alternatively one can use an I/O
 816     controller.  Linux provides two mechanisms designed to throttle
 817     some  I/O  streams  to allow others to meet their bandwidth and
 818     latency  requirements.  These mechanisms work, but they come at
 819     a  cost:  a  loss  of  as  much  as  80% of total available I/O
 820     bandwidth.  I  have run some tests to demonstrate this problem;
 821     some   upcoming  improvements  to  the  [62]bfq  I/O  scheduler
 822     promise to improve the situation considerably.
 823
 824     Throttling  does  guarantee control, even on drives that happen
 825     to  be highly utilized but, as will be seen, it has a hard time
 826     actually  ensuring  that  drives are highly utilized. Even with
 827     greedy  I/O  flows,  throttling  easily  ends  up  utilizing as
 828     little  as  20%  of the available speed of a flash-based drive.
 829     Such   a  speed  loss  may  be  particularly  problematic  with
 830     lower-end   storage.   On   the   opposite   end,  it  is  also
 831     disappointing  with  high-end  hardware, as the Linux block I/O
 832     stack  itself  has  been  [63]redesigned  from the ground up to
 833     fully  utilize  the  high  speed  of  modern,  fast storage. In
 834     addition,   throttling   fails   to   guarantee   the  expected
 835     bandwidths  if  I/O  contains  both  reads  and  writes,  or is
 836     sporadic in nature.
 837
 838     On  the  bright  side,  there  now  seems  to  be  an effective
 839     alternative  for controlling I/O: the proportional-share policy
 840     provided  by  the  bfq  I/O  scheduler.  It enables nearly 100%
 841     storage  bandwidth  utilization,  at  least  with  some  of the
 842     workloads  that  are  problematic  for  throttling. An upcoming
 843     version  of  bfq may be able to achieve this result with almost
 844     all  workloads.  Finally,  bfq  guarantees  bandwidths with all
 845     workloads.  The current limitation of bfq is that its execution
 846     overhead  becomes  significant  at  speeds  above  400,000  I/O
 847     operations per second on commodity CPUs.
 848
 849     Using  the  bfq  I/O  scheduler,  Linux  can  now guarantee low
 850     latency  to  lightweight  flows containing sporadic, short I/O.
 851     No  throughput  issues arise, and no configuration is required.
 852     This  capability benefits important, time-sensitive tasks, such
 853     as  video  or audio streaming, as well as executing commands or
 854     starting  applications.  Although  benchmarks are not available
 855     yet,  these  guarantees  might  also  be  provided by the newly
 856     proposed  [64]I/O latency controller . It allows administrators
 857     to  set target latencies for I/O requests originating from each
 858     group  of  processes,  and  favors  the  groups with the lowest
 859     target latency.
 860
 861     The testbed
 862
 863     I  ran  the  tests with an ext4 filesystem mounted on a PLEXTOR
 864     PX-256M5S  SSD,  which  features  a  peak rate of ~160MB/s with
 865     random  I/O,  and  of  ~500MB/s  with  sequential  I/O.  I used
 866     blk-mq,  in  Linux  4.18. The system was equipped with a 2.4GHz
 867     Intel  Core  i7-2760QM  CPU  and  1.3GHz  DDR3  DRAM. In such a
 868     system,  a  single  thread  doing  synchronous  reads reaches a
 869     throughput of 23MB/s.
 870
 871     For  the purposes of these tests, each process is considered to
 872     be  in  one of two groups, termed "target" and "interferers". A
 873     target  is  a  single-process,  I/O-bound  group  whose  I/O is
 874     focused  on.  In  particular,  I  measure  the  I/O  throughput
 875     enjoyed  by  this  group to get the minimum bandwidth delivered
 876     to  the group. An interferer is single-process group whose role
 877     is  to  generate additional I/O that interferes with the I/O of
 878     the  target.  The  tested  workloads  contain  one  target  and
 879     multiple interferers.
 880
 881     The  single  process  in  each  group  either  reads or writes,
 882     through  asynchronous  (buffered)  operations,  to  one  file —
 883     different  from the file read or written by any other process —
 884     after  invalidating  the  buffer cache for the file. I define a
 885     reader  or  writer  process as either "random" or "sequential",
 886     depending  on  whether  it  reads  or writes its file at random
 887     positions  or  sequentially.  Finally, an interferer is defined
 888     as  being either "active" or "inactive" depending on whether it
 889     performs  I/O during the test. When an interferer is mentioned,
 890     it is assumed that the interferer is active.
 891
 892     Workloads  are  defined  so as to try to cover the combinations
 893     that,  I believe, most influence the performance of the storage
 894     device  and of the I/O policies. For brevity, in this article I
 895     show results for only two groups of workloads:
 896
 897     Static  sequential  :  four  synchronous  sequential readers or
 898     four   asynchronous  sequential  writers,  plus  five  inactive
 899     interferers.
 900
 901     Static  random  :  four  synchronous random readers, all with a
 902     block size equal to 4k, plus five inactive interferers.
 903
 904     To  create  each  workload,  I  considered,  for  each  mix  of
 905     interferers  in the group, two possibilities for the target: it
 906     could  be  either  a random or a sequential synchronous reader.
 907     In  [65]a  longer version of this article [PDF] , you will also
 908     find   results  for  workloads  with  varying  degrees  of  I/O
 909     randomness,  and for dynamic workloads (containing sporadic I/O
 910     sources).  These extra results confirm the losses of throughput
 911     and I/O control for throttling that are shown here.
 912
 913     I/O policies
 914
 915     Linux  provides  two I/O-control mechanisms for guaranteeing (a
 916     minimum)  bandwidth, or at least fairness, to long-lived flows:
 917     the   throttling  and  proportional-share  I/O  policies.  With
 918     throttling,  one  can  set  a  maximum  bandwidth  limit — "max
 919     limit"  for brevity — for the I/O of each group. Max limits can
 920     be  used,  in an indirect way, to provide the service guarantee
 921     at  the  focus  of  this  article.  For  example,  to guarantee
 922     minimum  bandwidths  to  I/O flows, a group can be guaranteed a
 923     minimum  bandwidth by limiting the maximum bandwidth of all the
 924     other groups.
 925
 926     Unfortunately,  max  limits  have  two  drawbacks  in  terms of
 927     throughput.  First,  if  some groups do not use their allocated
 928     bandwidth,  that  bandwidth cannot be reclaimed by other active
 929     groups.  Second,  limits  must comply with the worst-case speed
 930     of  the  device,  namely, its random-I/O peak rate. Such limits
 931     will  clearly  leave  a lot of throughput unused with workloads
 932     that  otherwise  would  drive  the  device to higher throughput
 933     levels.  Maximizing  throughput  is  simply  not  a goal of max
 934     limits.  So,  for brevity, test results with max limits are not
 935     shown  here.  You  can find these results, plus a more detailed
 936     description  of  the  above  drawbacks,  in the long version of
 937     this article.
 938
 939     Because  of  these  drawbacks,  a  new, still experimental, low
 940     limit  has  been  added to the throttling policy. If a group is
 941     assigned  a low limit, then the throttling policy automatically
 942     limits  the  I/O of the other groups in such a way to guarantee
 943     to  the  group  a  minimum  bandwidth equal to its assigned low
 944     limit.  This  new  throttling  mechanism  throttles no group as
 945     long  as  every  group is getting at least its assigned minimum
 946     bandwidth.  I  tested  this mechanism, but did not consider the
 947     interesting  problem  of guaranteeing minimum bandwidths while,
 948     at the same time, enforcing maximum bandwidths.
 949
 950     The  other  I/O  policy available in Linux, proportional share,
 951     provides  weighted  fairness.  Each group is assigned a weight,
 952     and   should   receive   a  portion  of  the  total  throughput
 953     proportional  to  its  weight.  This  scheme guarantees minimum
 954     bandwidths  in  the  same way that low limits do in throttling.
 955     In  particular, it guarantees to each group a minimum bandwidth
 956     equal  to  the  ratio  between the weight of the group, and the
 957     sum  of the weights of all the groups that may be active at the
 958     same time.
 959
 960     The  actual implementation of the proportional-share policy, on
 961     a  given drive, depends on what flavor of the block layer is in
 962     use  for  that  drive.  If  the drive is using the legacy block
 963     interface,  the policy is implemented by the cfq I/O scheduler.
 964     Unfortunately,   cfq   fails   to   control   bandwidths   with
 965     flash-based  storage,  especially  on  drives featuring command
 966     queueing.  This  case  is  not  considered in these tests. With
 967     drives  using  the  multiqueue interface, proportional share is
 968     implemented  by  bfq. This is the combination considered in the
 969     tests.
 970
 971     To  benchmark  both  throttling  (low  limits) and proportional
 972     share,  I  tested,  for  each workload, the combinations of I/O
 973     policies  and  I/O  schedulers  reported in the table below. In
 974     the  end,  there  are  three  test  cases for each workload. In
 975     addition,  for some workloads, I considered two versions of bfq
 976     for the proportional-share policy.
 977
 978     Name
 979
 980     I/O policy
 981
 982     Scheduler
 983
 984     Parameter for target
 985
 986     Parameter for each of the four active interferers
 987
 988     Parameter for each of the five inactive interferers
 989
 990     Sum of parameters
 991
 992     low-none
 993
 994     Throttling with low limits
 995
 996     none
 997
 998     10MB/s
 999
1000     10MB/s (tot: 40)
1001
1002     20MB/s (tot: 100)
1003
1004     150MB/s
1005
1006     prop-bfq
1007
1008     Proportional share
1009
1010     bfq
1011
1012     300
1013
1014     100 (tot: 400)
1015
1016     200 (tot: 1000)
1017
1018     1700
1019
1020     For  low  limits,  I  report  results with only none as the I/O
1021     scheduler,  because  the  results  are  the same with kyber and
1022     mq-deadline.
1023
1024     The  capabilities of the storage medium and of low limits drove
1025     the policy configurations. In particular:
1026
1027     The  configuration  of the target and of the active interferers
1028     for  low-none  is  the one for which low-none provides its best
1029     possible  minimum-bandwidth  guarantee  to  the target: 10MB/s,
1030     guaranteed  if  all interferers are readers. Results remain the
1031     same  regardless of the values used for target latency and idle
1032     time;  I  set them to 100µs and 1000µs, respectively, for every
1033     group.
1034
1035     Low  limits  for  inactive  interferers  are  set  to twice the
1036     limits  for active interferers, to pose greater difficulties to
1037     the policy.
1038
1039     I  chose weights for prop-bfq so as to guarantee about the same
1040     minimum  bandwidth  as  low-none  to  the  target,  in the same
1041     only-reader  worst  case  as  for  low-none  and  to  preserve,
1042     between  the  weights  of  active and inactive interferers, the
1043     same  ratio  as  between  the low limits of active and inactive
1044     interferers.
1045
1046     Full  details  on  configurations  can  be  found  in  the long
1047     version of this article.
1048
1049     Each  workload  was  run  ten  times  for each policy, plus ten
1050     times   without  any  I/O  control,  i.e.,  with  none  as  I/O
1051     scheduler  and  no  I/O policy in use. For each run, I measured
1052     the  I/O  throughput of the target (which reveals the bandwidth
1053     provided  to  the target), the cumulative I/O throughput of the
1054     interferers,  and  the  total  I/O throughput. These quantities
1055     fluctuated  very  little  during  each  run,  as well as across
1056     different  runs. Thus in the graphs I report only averages over
1057     per-run  average throughputs. In particular, for the case of no
1058     I/O  control,  I  report only the total I/O throughput, to give
1059     an  idea of the throughput that can be reached without imposing
1060     any control.
1061
1062     Results
1063
1064     This  plot  shows  throughput results for the simplest group of
1065     workloads: the static-sequential set.
1066
1067     With  a  random reader as the target against sequential readers
1068     as  interferers,  low-none  does  guarantee  the configured low
1069     limit   to  the  target.  Yet  it  reaches  only  a  low  total
1070     throughput.  The  throughput  of  the  random  reader evidently
1071     oscillates  around 10MB/s during the test. This implies that it
1072     is  at least slightly below 10MB/s for a significant percentage
1073     of  the  time.  But  when this happens, the low-limit mechanism
1074     limits  the  maximum bandwidth of every active group to the low
1075     limit  set  for the group, i.e., to just 10MB/s. The end result
1076     is  a total throughput lower than 10% of the throughput reached
1077     without I/O control.
1078
1079     That  said, the high throughput achieved without I/O control is
1080     obtained  by  choking  the random I/O of the target in favor of
1081     the  sequential  I/O  of  the interferers. Thus, it is probably
1082     more  interesting  to  compare  low-none  throughput  with  the
1083     throughput  reachable while actually guaranteeing 10MB/s to the
1084     target.  The  target  is  a single, synchronous, random reader,
1085     which  reaches  23MB/s while active. So, to guarantee 10MB/s to
1086     the  target,  it  is  enough  to serve it for about half of the
1087     time,  and the interferers for the other half. Since the device
1088     reaches  ~500MB/s  with  the sequential I/O of the interferers,
1089     the  resulting  throughput  with  this  service scheme would be
1090     (500+23)/2,  or  about 260MB/s. low-none thus reaches less than
1091     20%  of  the total throughput that could be reached while still
1092     preserving the target bandwidth.
1093
1094     prop-bfq  provides the target with a slightly higher throughput
1095     than  low-none.  This  makes  it harder for prop-bfq to reach a
1096     high  total throughput, because prop-bfq serves more random I/O
1097     (from  the target) than low-none. Nevertheless, prop-bfq gets a
1098     much  higher  total  throughput than low-none. According to the
1099     above  estimate,  this  throughput  is about 90% of the maximum
1100     throughput  that  could  be reached, for this workload, without
1101     violating  service  guarantees. The reason for this good result
1102     is  that  bfq  provides  an  effective  implementation  of  the
1103     proportional-share  service  policy.  At  any time, each active
1104     group  is  granted  a fraction of the current total throughput,
1105     and  the  sum  of  these  fractions  is  equal to one; so group
1106     bandwidths  naturally  saturate  the available total throughput
1107     at all times.
1108
1109     Things  change  with  the  second  workload:  a  random  reader
1110     against  sequential writers. Now low-none reaches a much higher
1111     total  throughput  than  prop-bfq.  low-none  serves  much more
1112     sequential  (write)  I/O  than  prop-bfq because writes somehow
1113     break  the  low-limit  mechanisms and prevail over the reads of
1114     the  target.  Conceivably,  this happens because writes tend to
1115     both  starve  reads  in  the OS (mainly by eating all available
1116     I/O  tags)  and to cheat on their completion time in the drive.
1117     In  contrast,  bfq  is  intentionally  configured  to privilege
1118     reads, to counter these issues.
1119
1120     In  particular, low-none gets an even higher throughput than no
1121     I/O  control  at all because it penalizes the random I/O of the
1122     target even more than the no-controller configuration.
1123
1124     Finally,  with  the  last  two workloads, prop-bfq reaches even
1125     higher  total  throughput  than  with the first two. It happens
1126     because  the  target  also  does  sequential  I/O,  and serving
1127     sequential  I/O  is  much  more  beneficial for throughput than
1128     serving  random  I/O.  With  these  two  workloads,  the  total
1129     throughput  is, respectively, close to or much higher than that
1130     reached  without  I/O control. For the last workload, the total
1131     throughput  is  much higher because, differently from none, bfq
1132     privileges  reads  over  asynchronous writes, and reads yield a
1133     higher  throughput  than  writes.  In  contrast, low-none still
1134     gets  lower  or much lower throughput than prop-bfq, because of
1135     the  same issues that hinder low-none throughput with the first
1136     two workloads.
1137
1138     As  for  bandwidth  guarantees,  with  readers  as  interferers
1139     (third  workload),  prop-bfq,  as  expected, gives the target a
1140     fraction  of  the  total throughput proportional to its weight.
1141     bfq    approximates    perfect   proportional-share   bandwidth
1142     distribution  among groups doing I/O of the same type (reads or
1143     writes)  and  with  the  same  locality (sequential or random).
1144     With  the last workload, prop-bfq gives much more throughput to
1145     the  reader  than  to  all the interferers, because interferers
1146     are asynchronous writers, and bfq privileges reads.
1147
1148     The  second  group  of  workloads  (static random), is the one,
1149     among   all   the  workloads  considered,  for  which  prop-bfq
1150     performs worst. Results are shown below:
1151
1152     This  chart reports results not only for mainline bfq, but also
1153     for  an improved version of bfq which is currently under public
1154     testing.  As  can  be  seen, with only random readers, prop-bfq
1155     reaches  a  much  lower  total  throughput  than low-none. This
1156     happens  because of the Achilles heel of the bfq I/O scheduler.
1157     If  the  process  in  service  does  synchronous  I/O and has a
1158     higher  weight  than  some  other process, then, to give strong
1159     bandwidth   guarantees   to   that   process,   bfq  plugs  I/O
1160     dispatching  every  time  the process temporarily stops issuing
1161     I/O   requests.   In  this  respect,  processes  actually  have
1162     differentiated  weights and do synchronous I/O in the workloads
1163     tested.  So  bfq systematically performs I/O plugging for them.
1164     Unfortunately,  this  plugging  empties  the internal queues of
1165     the  drive, which kills throughput with random I/O. And the I/O
1166     of all processes in these workloads is also random.
1167
1168     The  situation  reverses  with  a  sequential reader as target.
1169     Yet,  the most interesting results come from the new version of
1170     bfq,  containing  small  changes  to  counter exactly the above
1171     weakness.  This  version  recovers  most of the throughput loss
1172     with  the  workload  made of only random I/O and more; with the
1173     second  workload,  where  the target is a sequential reader, it
1174     reaches about 3.7 times the total throughput of low-none.
1175
1176     When  the main concern is the latency of flows containing short
1177     I/O,  Linux seems now rather high performing, thanks to the bfq
1178     I/O  scheduler  and  the  I/O  latency  controller.  But if the
1179     requirement  is  to  provide  explicit bandwidth guarantees (or
1180     just  fairness) to I/O flows, then one must be ready to give up
1181     much  or most of the speed of the storage media. bfq helps with
1182     some   workloads,   but  loses  most  of  the  throughput  with
1183     workloads  consisting  of mostly random I/O. Fortunately, there
1184     is  apparently  hope  for  much  better  performance  since  an
1185     improvement,  still  under  development, seems to enable bfq to
1186     reach a high throughput with all workloads tested so far.
1187
1188     [  I  wish  to  thank  Vivek Goyal for enabling me to make this
1189     article much more fair and sound.]
1190
1191     [66]Comments (4 posted)
1192
1193     [67]KDE's onboarding initiative, one year later
1194
1195     August 24, 2018
1196
1197     This article was contributed by Marta Rybczyńska
1198
1199     [68]Akademy
1200
1201     In  2017,  the  KDE  community  decided  on  [69]three goals to
1202     concentrate  on  for  the  next  few  years.  One  of  them was
1203     [70]streamlining   the  onboarding  of  new  contributors  (the
1204     others  were  [71]improving usability and [72]privacy ). During
1205     [73]Akademy  ,  the  yearly  KDE  conference  that  was held in
1206     Vienna  in  August,  Neofytos Kolokotronis shared the status of
1207     the  onboarding  goal,  the work done during the last year, and
1208     further  plans.  While it is a complicated process in a project
1209     as  big  and  diverse  as  KDE, numerous improvements have been
1210     already made.
1211
1212     Two  of the three KDE community goals were proposed by relative
1213     newcomers.  Kolokotronis  was  one  of those, having joined the
1214     [74]KDE  Promo  team  not  long  before  proposing the focus on
1215     onboarding.  He  had  previously  been involved with [75]Chakra
1216     Linux  ,  a  distribution  based on KDE software. The fact that
1217     new  members of the community proposed strategic goals was also
1218     noted in the [76]Sunday keynote by Claudia Garad .
1219
1220     Proper  onboarding  adds excitement to the contribution process
1221     and  increases retention, he explained. When we look at [77]the
1222     definition  of  onboarding  ,  it is a process in which the new
1223     contributors  acquire  knowledge, skills, and behaviors so that
1224     they  can  contribute effectively. Kolokotronis proposed to see
1225     it  also  as  socialization:  integration  into  the  project's
1226     relationships, culture, structure, and procedures.
1227
1228     The  gains  from  proper  onboarding  are many. The project can
1229     grow   by  attracting  new  blood  with  new  perspectives  and
1230     solutions.   The  community  maintains  its  health  and  stays
1231     vibrant.  Another  important  advantage of efficient onboarding
1232     is  that  replacing  current  contributors  becomes easier when
1233     they  change interests, jobs, or leave the project for whatever
1234     reason.  Finally,  successful  onboarding adds new advocates to
1235     the project.
1236
1237     Achievements so far and future plans
1238
1239     The  team  started  with  ideas  for  a  centralized onboarding
1240     process  for the whole of KDE. They found out quickly that this
1241     would  not  work  because KDE is "very decentralized", so it is
1242     hard  to  provide  tools  and procedures that are going to work
1243     for   the  whole  project.  According  to  Kolokotronis,  other
1244     characteristics   of   KDE  that  impact  onboarding  are  high
1245     diversity,   remote   and   online   teams,   and  hundreds  of
1246     contributors  in dozens of projects and teams. In addition, new
1247     contributors  already know in which area they want to take part
1248     and  they  prefer  specific  information  that will be directly
1249     useful for them.
1250
1251     So  the  team  changed its approach; several changes have since
1252     been  proposed  and  implemented.  The  [78]Get  Involved page,
1253     which  is  expected to be one of the resources new contributors
1254     read  first, has been rewritten. For the [79]Junior Jobs page ,
1255     the  team  is  [80] [81]discussing what the generic content for
1256     KDE  as  a whole should be. The team simplified [82]Phabricator
1257     registration  ,  which  resulted  in  documenting  the  process
1258     better.  Another part of the work includes the [83]KDE Bugzilla
1259     ;  it  includes, for example initiatives to limit the number of
1260     states of a ticket or remove obsolete products.
1261
1262     The   [84]Plasma   Mobile  team  is  heavily  involved  in  the
1263     onboarding  goal.  The Plasma Mobile developers have simplified
1264     their    development   environment   setup   and   created   an
1265     [85]interactive  "Get  Involved"  page. In addition, the Plasma
1266     team  changed  the  way task descriptions are written; they now
1267     contain  more detail, so that it is easier to get involved. The
1268     basic  description  should  be  short  and clear, and it should
1269     include  details  of  the  problem  and possible solutions. The
1270     developers  try  to  share  the  list  of  skills  necessary to
1271     fulfill  the  tasks  and  include  clear links to the technical
1272     resources needed.
1273
1274     Kolokotronis  and  team  also identified a new potential source
1275     of  contributors  for  KDE:  distributions using KDE. They have
1276     the  advantage  of  already knowing and using the software. The
1277     next  idea  the team is working on is to make sure that setting
1278     up  a  development  environment is easy. The team plans to work
1279     on this during a dedicated sprint this autumn.
1280
1281     Searching for new contributors
1282
1283     Kolokotronis  plans  to  search  for  new  contributors  at the
1284     periphery  of  the  project,  among  the "skilled enthusiasts":
1285     loyal  users  who  actually  care  about the project. They "can
1286     make  wonders",  he  said.  Those  individuals may be also less
1287     confident  or  shy,  have  troubles  making the first step, and
1288     need  guidance.  The  project  leaders  should  take  that into
1289     account.
1290
1291     In   addition,   newcomers   are  all  different.  Kolokotronis
1292     provided  a  long  list  of  how contributors differ, including
1293     skills  and  knowledge,  motives  and  interests,  and time and
1294     dedication.  His  advice  is to "try to find their superpower",
1295     the  skills  they  have  that  are  missing  in the team. Those
1296     "superpowers" can then be used for the benefit of the project.
1297
1298     If  a project does nothing else, he said, it can start with its
1299     documentation.   However,   this   does   not  only  mean  code
1300     documentation.  Writing  down  the  procedures  or  information
1301     about  the internal work of the project, like who is working on
1302     what,  is  an  important  part of a project's documentation and
1303     helps  newcomers.  There  should  be  also guidelines on how to
1304     start, especially setting up the development environment.
1305
1306     The  first  thing  the  project leaders should do, according to
1307     Kolokotronis,  is to spend time on introducing newcomers to the
1308     project.  Ideally  every  new  contributor  should  be assigned
1309     mentors  —  more  experienced  members  who  can help them when
1310     needed.  The mentors and project leaders should find tasks that
1311     are   interesting   for  each  person.  Answering  an  audience
1312     question   on   suggestions   for   shy  new  contributors,  he
1313     recommended  even  more  mentoring.  It is also very helpful to
1314     make  sure  that  newcomers  have  enough  to  read, but "avoid
1315     RTFM",  he  highlighted.  It is also easy for a new contributor
1316     "to  fly  away",  he  said.  The solution is to keep requesting
1317     things and be proactive.
1318
1319     What the project can do?
1320
1321     Kolokotronis  suggested  a number of actions for a project when
1322     it   wants  to  improve  its  onboarding.  The  first  step  is
1323     preparation:  the  project  leaders  should know the team's and
1324     the  project's  needs. Long-term planning is important, too. It
1325     is  not  enough  to wait for contributors to come — the project
1326     should  be  proactive,  which means reaching out to candidates,
1327     suggesting   appropriate  tasks  and,  finally,  making  people
1328     available for the newcomers if they need help.
1329
1330     This  leads to next step: to be a mentor. Kolokotronis suggests
1331     being  a  "great  host",  but  also  trying  to  phase  out the
1332     dependency   on   the   mentor   rapidly.  "We  have  been  all
1333     newcomers",  he  said.  It  can  be  intimidating  to  join  an
1334     existing  group. Onboarding creates a sense of belonging which,
1335     in turn, increases retention.
1336
1337     The  last  step  proposed  was  to  be strategic. This includes
1338     thinking  about  the  emotions  you  want  newcomers  to  feel.
1339     Kolokotronis  explained the strategic part with an example. The
1340     overall   goal   is   (surprise!)  improve  onboarding  of  new
1341     contributors.  An  intermediate  objective might be to keep the
1342     newcomers  after  they  have  made  their first commit. If your
1343     strategy  is  to  keep  them  confident  and proud, you can use
1344     different  tactics  like  praise and acknowledgment of the work
1345     in  public.  Another  useful  tactic  may  be  assigning simple
1346     tasks, according to the skill of the contributor.
1347
1348     To   summarize,   the   most   important  thing,  according  to
1349     Kolokotronis,  is  to  respond  quickly and spend time with new
1350     contributors.  This  time should be used to explain procedures,
1351     and  to  introduce the people and culture. It is also essential
1352     to  guide  first  contributions  and praise contributor's skill
1353     and  effort. Increase the difficulty of tasks over time to keep
1354     contributors  motivated  and  challenged. And finally, he said,
1355     "turn them into mentors".
1356
1357     Kolokotronis  acknowledges  that  onboarding  "takes  time" and
1358     "everyone  complains"  about  it. However, he is convinced that
1359     it  is  beneficial  in  the  long  term  and  that it decreases
1360     developer turnover.
1361
1362     Advice to newcomers
1363
1364     Kolokotronis  concluded  with some suggestions for newcomers to
1365     a  project.  They  should  try  to be persistent and to not get
1366     discouraged  when  something  goes  wrong. Building connections
1367     from   the  very  beginning  is  helpful.  He  suggests  asking
1368     questions  as  if you were already a member "and things will be
1369     fine". However, accept criticism if it happens.
1370
1371     One  of  the  next  actions  of  the onboarding team will be to
1372     collect  feedback  from  newcomers and experienced contributors
1373     to  see  if they agree on the ideas and processes introduced so
1374     far.
1375
1376     [86]Comments (none posted)
1377
1378     [87]Sharing and archiving data sets with Dat
1379
1380     August 27, 2018
1381
1382     This article was contributed by Antoine Beaupré
1383
1384     [88]Dat  is  a  new peer-to-peer protocol that uses some of the
1385     concepts  of  [89]BitTorrent  and  Git.  Dat  primarily targets
1386     researchers  and  open-data activists as it is a great tool for
1387     sharing,  archiving, and cataloging large data sets. But it can
1388     also  be  used to implement decentralized web applications in a
1389     novel way.
1390
1391     Dat quick primer
1392
1393     Dat  is  written in JavaScript, so it can be installed with npm
1394     ,  but there are [90]standalone binary builds and a [91]desktop
1395     application  (as an AppImage). An [92]online viewer can be used
1396     to  inspect data for those who do not want to install arbitrary
1397     binaries on their computers.
1398
1399     The  command-line  application  allows  basic  operations  like
1400     downloading  existing  data sets and sharing your own. Dat uses
1401     a  32-byte hex string that is an [93]ed25519 public key , which
1402     is  is  used  to  discover  and  find  content  on the net. For
1403     example, this will download some sample data:  $ dat clone \
1404
1405     dat://778f8d955175c92e4ced5e4f5563f69bfec0c86cc6f670352c457943-
1406     666fe639 \
1407
1408     ~/Downloads/dat-demo
1409
1410     Similarly,  the  share  command  is  used  to share content. It
1411     indexes  the  files  in  a  given  directory  and creates a new
1412     unique  address  like the one above. The share command starts a
1413     server  that uses multiple discovery mechanisms (currently, the
1414     [94]Mainline  Distributed  Hash  Table  (DHT), a [95]custom DNS
1415     server  ,  and  multicast  DNS)  to announce the content to its
1416     peers.  This  is  how another user, armed with that public key,
1417     can  download  that  content with dat clone or mirror the files
1418     continuously with dat sync .
1419
1420     So  far,  this  looks  a  lot  like BitTorrent [96]magnet links
1421     updated  with 21st century cryptography. But Dat adds revisions
1422     on  top  of  that,  so  modifications  are automatically shared
1423     through  the  swarm.  That is important for public data sets as
1424     those  are  often  dynamic  in  nature.  Revisions also make it
1425     possible  to  use [97]Dat as a backup system by saving the data
1426     incrementally using an [98]archiver .
1427
1428     While  Dat  is designed to work on larger data sets, processing
1429     them  for  sharing  may  take a while. For example, sharing the
1430     Linux  kernel  source  code  required about five minutes as Dat
1431     worked  on indexing all of the files. This is comparable to the
1432     performance  offered by [99]IPFS and BitTorrent. Data sets with
1433     more or larger files may take quite a bit more time.
1434
1435     One  advantage  that  Dat  has  over  IPFS  is  that it doesn't
1436     duplicate  the  data. When IPFS imports new data, it duplicates
1437     the  files  into  ~/.ipfs . For collections of small files like
1438     the  kernel,  this  is not a huge problem, but for larger files
1439     like  videos  or  music,  it's  a  significant limitation. IPFS
1440     eventually  implemented  a solution to this [100]problem in the
1441     form  of the experimental [101]filestore feature , but it's not
1442     enabled  by  default.  Even  with that feature enabled, though,
1443     changes   to  data  sets  are  not  automatically  tracked.  In
1444     comparison,  Dat  operation on dynamic data feels much lighter.
1445     The downside is that each set needs its own dat share process.
1446
1447     Like  any  peer-to-peer  system, Dat needs at least one peer to
1448     stay  online  to  offer  the  content, which is impractical for
1449     mobile  devices. Hosting providers like [102]Hashbase (which is
1450     a  [103]pinning  service  in  Dat  jargon)  can help users keep
1451     content  online  without  running  their  own [104]server . The
1452     closest   parallel  in  the  traditional  web  ecosystem  would
1453     probably   be  content  distribution  networks  (CDN)  although
1454     pinning    services    are   not   necessarily   geographically
1455     distributed  and  a  CDN does not necessarily retain a complete
1456     copy of a website.  [105]
1457
1458     A  web  browser called [106]Beaker , based on the [107]Electron
1459     framework,  can  access  Dat  content  natively  without  going
1460     through  a pinning service. Furthermore, Beaker is essential to
1461     get   any   of  the  [108]Dat  applications  working,  as  they
1462     fundamentally  rely  on  dat://  URLs  to  do their magic. This
1463     means  that  Dat  applications won't work for most users unless
1464     they  install that special web browser. There is a [109]Firefox
1465     extension  called " [110]dat-fox " for people who don't want to
1466     install  yet  another  browser,  but  it  requires installing a
1467     [111]helper  program  .  The  extension  will  be  able to load
1468     dat://  URLs  but  many  applications  will still not work. For
1469     example,  the  [112]photo  gallery application completely fails
1470     with dat-fox.
1471
1472     Dat-based  applications  look promising from a privacy point of
1473     view.  Because of its peer-to-peer nature, users regain control
1474     over  where their data is stored: either on their own computer,
1475     an  online server, or by a trusted third party. But considering
1476     the  protocol  is not well established in current web browsers,
1477     I  foresee  difficulties  in adoption of that aspect of the Dat
1478     ecosystem.  Beyond  that,  it  is rather disappointing that Dat
1479     applications  cannot  run  natively in a web browser given that
1480     JavaScript is designed exactly for that.
1481
1482     Dat privacy
1483
1484     An  advantage  Dat  has  over other peer-to-peer protocols like
1485     BitTorrent   is   end-to-end   encryption.   I  was  originally
1486     concerned   by   the   encryption   design   when  reading  the
1487     [113]academic paper [PDF] :
1488
1489     It  is  up  to  client programs to make design decisions around
1490     which  discovery  networks  they  trust.  For  example if a Dat
1491     client  decides  to  use  the BitTorrent DHT to discover peers,
1492     and  they  are  searching for a publicly shared Dat key (e.g. a
1493     key  cited publicly in a published scientific paper) with known
1494     contents,  then because of the privacy design of the BitTorrent
1495     DHT  it  becomes  public  knowledge  what  key  that  client is
1496     searching for.
1497
1498     So  in  other  words, to share a secret file with another user,
1499     the  public key is transmitted over a secure side-channel, only
1500     to  then  leak  during  the discovery process. Fortunately, the
1501     public  Dat  key is not directly used during discovery as it is
1502     [114]hashed  with  BLAKE2B  .  Still, the security model of Dat
1503     assumes   the   public  key  is  private,  which  is  a  rather
1504     counterintuitive  concept  that  might upset cryptographers and
1505     confuse  users  who  are  frequently  encouraged  to  type such
1506     strings  in  address bars and search engines as part of the Dat
1507     experience.  There  is a [115]security & privacy FAQ in the Dat
1508     documentation warning about this problem:
1509
1510     One  of  the key elements of Dat privacy is that the public key
1511     is  never  used  in  any  discovery  network. The public key is
1512     hashed,  creating  the discovery key. Whenever peers attempt to
1513     connect to each other, they use the discovery key.
1514
1515     Data  is  encrypted  using  the  public key, so it is important
1516     that this key stays secure.
1517
1518     There  are  other  privacy  issues outlined in the document; it
1519     states that " Dat faces similar privacy risks as BitTorrent ":
1520
1521     When  you download a dataset, your IP address is exposed to the
1522     users  sharing  that dataset. This may lead to honeypot servers
1523     collecting  IP addresses, as we've seen in Bittorrent. However,
1524     with  dataset  sharing we can create a web of trust model where
1525     specific  institutions  are  trusted  as  primary  sources  for
1526     datasets, diminishing the sharing of IP addresses.
1527
1528     A  Dat  blog  post  refers to this issue as [116]reader privacy
1529     and  it is, indeed, a sensitive issue in peer-to-peer networks.
1530     It  is  how  BitTorrent  users  are discovered and served scary
1531     verbiage  from  lawyers, after all. But Dat makes this a little
1532     better  because,  to  join  a swarm, you must know what you are
1533     looking  for  already,  which means peers who can look at swarm
1534     activity  only  include  users  who know the secret public key.
1535     This  works  well  for  secret  content, but for larger, public
1536     data  sets, it is a real problem; it is why the Dat project has
1537     [117]avoided creating a Wikipedia mirror so far.
1538
1539     I  found  another  privacy  issue that is not documented in the
1540     security  FAQ  during  my  review of the protocol. As mentioned
1541     earlier,  the [118]Dat discovery protocol routinely phones home
1542     to  DNS  servers operated by the Dat project. This implies that
1543     the  default  discovery  servers (and an attacker watching over
1544     their  traffic)  know  who is publishing or seeking content, in
1545     essence  discovering  the  "social  network"  behind  Dat. This
1546     discovery  mechanism  can be disabled in clients, but a similar
1547     privacy  issue  applies  to  the  DHT as well, although that is
1548     distributed  so  it  doesn't  require  trust of the Dat project
1549     itself.
1550
1551     Considering  those  aspects  of the protocol, privacy-conscious
1552     users  will  probably  want  to  use Tor or other anonymization
1553     techniques to work around those concerns.
1554
1555     The future of Dat
1556
1557     [119]Dat  2.0  was  released  in  June  2017  with  performance
1558     improvements   and   protocol   changes.  [120]Dat  Enhancement
1559     Proposals  (DEPs)  guide the project's future development; most
1560     work  is  currently  geared  toward  implementing  the  draft "
1561     [121]multi-writer   proposal   "   in  [122]HyperDB  .  Without
1562     multi-writer  support, only the original publisher of a Dat can
1563     modify  it.  According  to  Joe  Hand, co-executive-director of
1564     [123]Code  for  Science & Society (CSS) and Dat core developer,
1565     in  an  IRC  chat, "supporting multiwriter is a big requirement
1566     for  lots  of  folks". For example, while Dat might allow Alice
1567     to  share  her  research  results with Bob, he cannot modify or
1568     contribute  back  to  those results. The multi-writer extension
1569     allows  for  Alice  to assign trust to Bob so he can have write
1570     access to the data.
1571
1572     Unfortunately,  the  current  proposal doesn't solve the " hard
1573     problems  " of " conflict merges and secure key distribution ".
1574     The  former  will  be worked out through user interface tweaks,
1575     but  the  latter  is  a  classic problem that security projects
1576     have   typically   trouble  finding  solutions  for—Dat  is  no
1577     exception.  How  will Alice securely trust Bob? The OpenPGP web
1578     of  trust?  Hexadecimal  fingerprints  read over the phone? Dat
1579     doesn't provide a magic solution to this problem.
1580
1581     Another  thing limiting adoption is that Dat is not packaged in
1582     any  distribution  that I could find (although I [124]requested
1583     it  in  Debian  )  and,  considering the speed of change of the
1584     JavaScript  ecosystem,  this  is  unlikely  to  change any time
1585     soon.  A  [125]Rust  implementation  of  the  Dat  protocol has
1586     started,  however,  which  might  be easier to package than the
1587     multitude  of  [126]Node.js  modules. In terms of mobile device
1588     support,  there is an experimental Android web browser with Dat
1589     support  called  [127]Bunsen  , which somehow doesn't run on my
1590     phone.  Some  adventurous  users  have  successfully run Dat in
1591     [128]Termux  .  I  haven't  found an app running on iOS at this
1592     point.
1593
1594     Even  beyond  platform  support, distributed protocols like Dat
1595     have  a  tough  slope  to climb against the virtual monopoly of
1596     more  centralized  protocols,  so  it  remains  to  be seen how
1597     popular  those  tools  will  be.  Hand says Dat is supported by
1598     multiple  non-profit  organizations. Beyond CSS, [129]Blue Link
1599     Labs  is working on the Beaker Browser as a self-funded startup
1600     and  a  grass-roots  organization, [130]Digital Democracy , has
1601     contributed  to  the  project.  The  [131]Internet  Archive has
1602     [132]announced  a  collaboration  between  itself, CSS, and the
1603     California  Digital  Library to launch a pilot project to see "
1604     how   members  of  a  cooperative,  decentralized  network  can
1605     leverage  shared  services  to  ensure  data preservation while
1606     reducing storage costs and increasing replication counts ".
1607
1608     Hand  said  adoption in academia has been "slow but steady" and
1609     that  the [133]Dat in the Lab project has helped identify areas
1610     that  could  help researchers adopt the project. Unfortunately,
1611     as  is  the case with many free-software projects, he said that
1612     "our  team is definitely a bit limited on bandwidth to push for
1613     bigger  adoption".  Hand said that the project received a grant
1614     from   [134]Mozilla   Open   Source   Support  to  improve  its
1615     documentation, which will be a big help.
1616
1617     Ultimately,   Dat   suffers   from  a  problem  common  to  all
1618     peer-to-peer  applications,  which is naming. Dat addresses are
1619     not  exactly  intuitive:  humans  do not remember strings of 64
1620     hexadecimal  characters well. For this, Dat took a [135]similar
1621     approach  to IPFS by using DNS TXT records and /.well-known URL
1622     paths   to  bridge  existing,  human-readable  names  with  Dat
1623     hashes.  So  this sacrifices a part of the decentralized nature
1624     of the project in favor of usability.
1625
1626     I  have  tested  a lot of distributed protocols like Dat in the
1627     past  and I am not sure Dat is a clear winner. It certainly has
1628     advantages  over IPFS in terms of usability and resource usage,
1629     but  the  lack  of packages on most platforms is a big limit to
1630     adoption  for  most  people. This means it will be difficult to
1631     share  content  with  my  friends  and  family with Dat anytime
1632     soon,  which  would  probably  be  my  primary use case for the
1633     project.  Until  the  protocol  reaches the wider adoption that
1634     BitTorrent  has  seen  in  terms  of  platform  support, I will
1635     probably   wait   before  switching  everything  over  to  this
1636     promising project.
1637
1638     [136]Comments (11 posted)
1639
1640     Page editor : Jonathan Corbet
1641
1642     Inside this week's LWN.net Weekly Edition
1643
1644     [137]Briefs  :  OpenSSH  7.8;  4.19-rc1;  Which stable?; Netdev
1645     0x12; Bison 3.1; Quotes; ...
1646
1647     [138]Announcements  :  Newsletters;  events;  security updates;
1648     kernel patches; ...  Next page : [139]Brief items>>
1649
1650
1651
1652     [1] https://lwn.net/Articles/763743/
1653
1654     [2] https://lwn.net/Articles/763626/
1655
1656     [3] https://lwn.net/Articles/763641/
1657
1658     [4] https://lwn.net/Articles/763106/
1659
1660     [5] https://lwn.net/Articles/763603/
1661
1662     [6] https://lwn.net/Articles/763175/
1663
1664     [7] https://lwn.net/Articles/763492/
1665
1666     [8] https://lwn.net/Articles/763254/
1667
1668     [9] https://lwn.net/Articles/763255/
1669
1670     [10] https://lwn.net/Articles/763743/#Comments
1671
1672     [11] https://lwn.net/Articles/763626/
1673
1674     [12] http://julialang.org/
1675
1676     [13] https://julialang.org/blog/2018/08/one-point-zero
1677
1678     [14] https://julialang.org/benchmarks/
1679
1680     [15] https://juliacomputing.com/
1681
1682     [16] https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93p-
1683     rint_loop
1684
1685     [17] http://llvm.org/
1686
1687     [18] http://www.3blue1brown.com/essence-of-linear-algebra-page/
1688
1689     [19] http://www.netlib.org/lapack/
1690
1691     [20] https://lwn.net/Articles/657157/
1692
1693     [21] https://julialang.org/publications/julia-fresh-approach-B-
1694     EKS.pdf
1695
1696     [22] https://lwn.net/Articles/738915/
1697
1698     [23] https://pypy.org/
1699
1700     [24] https://github.com/JuliaPy/PyCall.jl
1701
1702     [25] https://github.com/JuliaInterop/RCall.jl
1703
1704     [26] https://docs.julialang.org/en/stable/
1705
1706     [27] https://julialang.org/learning/
1707
1708     [28] http://bogumilkaminski.pl/files/julia_express.pdf
1709
1710     [29] https://docs.julialang.org/en/stable/manual/noteworthy-di-
1711     fferences/#Noteworthy-differences-from-Python-1
1712
1713     [30] https://lwn.net/Articles/746386/
1714
1715     [31] https://github.com/JuliaLang/IJulia.jl
1716
1717     [32] https://lwn.net/Articles/764001/
1718
1719     [33] https://lwn.net/Articles/763626/#Comments
1720
1721     [34] https://lwn.net/Articles/763641/
1722
1723     [35] https://lwn.net/Archives/ConferenceByYear/#2018-Linux_Sec-
1724     urity_Summit_NA
1725
1726     [36]  https://events.linuxfoundation.org/events/linux-security-
1727     summit-north-america-2018/
1728
1729     [37] https://kernsec.org/wiki/index.php/Kernel_Self_Protection-
1730     _Project
1731
1732     [38] https://lwn.net/Articles/763644/
1733
1734     [39] https://raphlinus.github.io/programming/rust/2018/08/17/u-
1735     ndefined-behavior.html
1736
1737     [40] https://lwn.net/Articles/749064/
1738
1739     [41] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/-
1740     linux.git/commit/?id=02361bc77888
1741
1742     [42] https://lore.kernel.org/lkml/CA+55aFzCG-zNmZwX4A2FQpadafL-
1743     fEzK6CC=qPXydAacU1RqZWA@mail.gmail.com/T/#u
1744
1745     [43] https://lwn.net/Articles/758245/
1746
1747     [44] https://lwn.net/Articles/718888/
1748
1749     [45] https://lwn.net/Articles/744507/
1750
1751     [46] https://outflux.net/slides/2018/lss/danger.pdf
1752
1753     [47] https://lwn.net/Articles/763641/#Comments
1754
1755     [48] https://lwn.net/Articles/763106/
1756
1757     [49] https://lwn.net/Articles/763497/
1758
1759     [50] https://lwn.net/Articles/762566/
1760
1761     [51] https://lwn.net/Articles/761118/
1762
1763     [52] https://git.kernel.org/linus/d5791044d2e5749ef4de84161cec-
1764     5532e2111540
1765
1766     [53] https://lwn.net/ml/linux-kernel/20180630000253.70103-1-sq-
1767     ue@chromium.org/
1768
1769     [54] https://git.kernel.org/linus/771c035372a036f83353eef46dbb-
1770     829780330234
1771
1772     [55] https://lwn.net/Articles/745073/
1773
1774     [56] https://lwn.net/ml/linux-kernel/CA+55aFxFjAmrFpwQmEHCthHO-
1775     zgidCKnod+cNDEE+3Spu9o1s3w@mail.gmail.com/
1776
1777     [57] https://lwn.net/Articles/759499/
1778
1779     [58] https://lwn.net/Articles/762355/
1780
1781     [59] https://lwn.net/ml/linux-fsdevel/20180823223145.GK6515@Ze-
1782     nIV.linux.org.uk/
1783
1784     [60] https://lwn.net/Articles/763106/#Comments
1785
1786     [61] https://lwn.net/Articles/763603/
1787
1788     [62] https://lwn.net/Articles/601799/
1789
1790     [63] https://lwn.net/Articles/552904
1791
1792     [64] https://lwn.net/Articles/758963/
1793
1794     [65] http://algogroup.unimore.it/people/paolo/pub-docs/extende-
1795     d-lat-bw-throughput.pdf
1796
1797     [66] https://lwn.net/Articles/763603/#Comments
1798
1799     [67] https://lwn.net/Articles/763175/
1800
1801     [68] https://lwn.net/Archives/ConferenceByYear/#2018-Akademy
1802
1803     [69] https://dot.kde.org/2017/11/30/kdes-goals-2018-and-beyond
1804
1805     [70] https://phabricator.kde.org/T7116
1806
1807     [71] https://phabricator.kde.org/T6831
1808
1809     [72] https://phabricator.kde.org/T7050
1810
1811     [73] https://akademy.kde.org/
1812
1813     [74] https://community.kde.org/Promo
1814
1815     [75] https://www.chakralinux.org/
1816
1817     [76] https://conf.kde.org/en/Akademy2018/public/events/79
1818
1819     [77] https://en.wikipedia.org/wiki/Onboarding
1820
1821     [78] https://community.kde.org/Get_Involved
1822
1823     [79] https://community.kde.org/KDE/Junior_Jobs
1824
1825     [80] https://lwn.net/Articles/763189/
1826
1827     [81] https://phabricator.kde.org/T8686
1828
1829     [82] https://phabricator.kde.org/T7646
1830
1831     [83] https://bugs.kde.org/
1832
1833     [84] https://www.plasma-mobile.org/index.html
1834
1835     [85] https://www.plasma-mobile.org/findyourway
1836
1837     [86] https://lwn.net/Articles/763175/#Comments
1838
1839     [87] https://lwn.net/Articles/763492/
1840
1841     [88] https://datproject.org
1842
1843     [89] https://www.bittorrent.com/
1844
1845     [90] https://github.com/datproject/dat/releases
1846
1847     [91] https://docs.datproject.org/install
1848
1849     [92] https://datbase.org/
1850
1851     [93] https://ed25519.cr.yp.to/
1852
1853     [94] https://en.wikipedia.org/wiki/Mainline_DHT
1854
1855     [95] https://github.com/mafintosh/dns-discovery
1856
1857     [96] https://en.wikipedia.org/wiki/Magnet_URI_scheme
1858
1859     [97] https://blog.datproject.org/2017/10/13/using-dat-for-auto-
1860     matic-file-backups/
1861
1862     [98] https://github.com/mafintosh/hypercore-archiver
1863
1864     [99] https://ipfs.io/
1865
1866     [100] https://github.com/ipfs/go-ipfs/issues/875
1867
1868     [101] https://github.com/ipfs/go-ipfs/blob/master/docs/experim-
1869     ental-features.md#ipfs-filestore
1870
1871     [102] https://hashbase.io/
1872
1873     [103] https://github.com/datprotocol/DEPs/blob/master/proposal-
1874     s/0003-http-pinning-service-api.md
1875
1876     [104] https://docs.datproject.org/server
1877
1878     [105] https://lwn.net/Articles/763544/
1879
1880     [106] https://beakerbrowser.com/
1881
1882     [107] https://electronjs.org/
1883
1884     [108] https://github.com/beakerbrowser/explore
1885
1886     [109] https://addons.mozilla.org/en-US/firefox/addon/dat-p2p-p-
1887     rotocol/
1888
1889     [110] https://github.com/sammacbeth/dat-fox
1890
1891     [111] https://github.com/sammacbeth/dat-fox-helper
1892
1893     [112] https://github.com/beakerbrowser/dat-photos-app
1894
1895     [113] https://github.com/datproject/docs/raw/master/papers/dat-
1896     paper.pdf
1897
1898     [114] https://github.com/datprotocol/DEPs/blob/653e0cf40233b5d-
1899     474cddc04235577d9d55b2934/proposals/0000-peer-discovery.md#dis-
1900     covery-keys
1901
1902     [115] https://docs.datproject.org/security
1903
1904     [116] https://blog.datproject.org/2016/12/12/reader-privacy-on-
1905     the-p2p-web/
1906
1907     [117] https://blog.datproject.org/2017/12/10/dont-ship/
1908
1909     [118] https://github.com/datprotocol/DEPs/pull/7
1910
1911     [119] https://blog.datproject.org/2017/06/01/dat-sleep-release/
1912
1913     [120] https://github.com/datprotocol/DEPs
1914
1915     [121] https://github.com/datprotocol/DEPs/blob/master/proposal-
1916     s/0008-multiwriter.md
1917
1918     [122] https://github.com/mafintosh/hyperdb
1919
1920     [123] https://codeforscience.org/
1921
1922     [124] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890565
1923
1924     [125] https://github.com/datrs
1925
1926     [126] https://nodejs.org/en/
1927
1928     [127] https://bunsenbrowser.github.io/#!index.md
1929
1930     [128] https://termux.com/
1931
1932     [129] https://bluelinklabs.com/
1933
1934     [130] https://www.digital-democracy.org/
1935
1936     [131] https://archive.org
1937
1938     [132] https://blog.archive.org/2018/06/05/internet-archive-cod-
1939     e-for-science-and-society-and-california-digital-library-to-pa-
1940     rtner-on-a-data-sharing-and-preservation-pilot-project/
1941
1942     [133] https://github.com/codeforscience/Dat-in-the-Lab
1943
1944     [134] https://www.mozilla.org/en-US/moss/
1945
1946     [135] https://github.com/datprotocol/DEPs/blob/master/proposal-
1947     s/0005-dns.md
1948
1949     [136] https://lwn.net/Articles/763492/#Comments
1950
1951     [137] https://lwn.net/Articles/763254/
1952
1953     [138] https://lwn.net/Articles/763255/
1954
1955     [139] https://lwn.net/Articles/763254/
1956
1957
1958