test/expected/LWN/0000763252

   1              LWN.NET WEEKLY EDITION FOR AUGUST 30, 2018
   2
   3
   4
   5   o News link: https://lwn.net/Articles/763252/
   6   o Source link:
   7
   8
   9     [1]Welcome  to  the  LWN.net Weekly Edition for August 30, 2018
  10     This edition contains the following feature content:
  11
  12     [2]An  introduction  to the Julia language, part 1 : Julia is a
  13     language  designed  for  intensive numerical calculations; this
  14     article gives an overview of its core features.
  15
  16     [3]C  considered  dangerous  :  a Linux Security Summit talk on
  17     what is being done to make the use of C in the kernel safer.
  18
  19     [4]The  second  half  of  the  4.19  merge  window  : the final
  20     features  merged (or not merged) before the merge window closed
  21     for this cycle.
  22
  23     [5]Measuring  (and fixing) I/O-controller throughput loss : the
  24     kernel's   I/O   controllers   can   provide  useful  bandwidth
  25     guarantees, but at a significant cost in throughput.
  26
  27     [6]KDE's  onboarding initiative, one year later : what has gone
  28     right  in  KDE's  effort  to make it easier for contributors to
  29     join the project, and what remains to be done.
  30
  31     [7]Sharing  and  archiving  data  sets with Dat : an innovative
  32     approach to addressing and sharing data on the net.
  33
  34     This week's edition also includes these inner pages:
  35
  36     [8]Brief   items   :  Brief  news  items  from  throughout  the
  37     community.
  38
  39     [9]Announcements  : Newsletters, conferences, security updates,
  40     patches, and more.
  41
  42     Please  enjoy  this  week's  edition, and, as always, thank you
  43     for supporting LWN.net.
  44
  45     [10]Comments (none posted)
  46
  47     [11]An introduction to the Julia language, part 1
  48
  49     August 28, 2018
  50
  51     This article was contributed by Lee Phillips
  52
  53     [12]Julia  is  a  young  computer language aimed at serving the
  54     needs  of  scientists,  engineers,  and  other practitioners of
  55     numerically   intensive  programming.  It  was  first  publicly
  56     released   in   2012.  After  an  intense  period  of  language
  57     development,  version 1.0 was [13]released on August 8. The 1.0
  58     release  promises  years  of  language  stability; users can be
  59     confident  that  developments  in the 1.x series will not break
  60     their  code.  This  is  the  first  part  of a two-part article
  61     introducing  the  world  of  Julia.  This  part  will introduce
  62     enough  of  the  language syntax and constructs to allow you to
  63     begin  to write simple programs. The following installment will
  64     acquaint  you  with the additional pieces needed to create real
  65     projects, and to make use of Julia's ecosystem.
  66
  67     Goals and history
  68
  69     The  Julia  project  has ambitious goals. It wants the language
  70     to  perform  about  as  well  as  Fortran  or  C  when  running
  71     numerical  algorithms,  while  remaining as pleasant to program
  72     in  as Python. I believe the project has met these goals and is
  73     poised  to  see  increasing  adoption by numerical researchers,
  74     especially now that an official, stable release is available.
  75
  76     The  Julia  project  maintains  a [14]micro-benchmark page that
  77     compares  its  numerical  performance  against  both statically
  78     compiled   languages   (C,   Fortran)   and  dynamically  typed
  79     languages  (R,  Python). While it's certainly possible to argue
  80     about  the relevance and fairness of particular benchmarks, the
  81     data  overall  supports  the Julia team's contention that Julia
  82     has   generally   achieved  parity  with  Fortran  and  C;  the
  83     benchmark source code is available.
  84
  85     Julia  began  as  research  in  computer  science  at  MIT; its
  86     creators  are  Alan  Edelman,  Stefan Karpinski, Jeff Bezanson,
  87     and  Viral  Shah.  These  four  remain active developers of the
  88     language.  They, along with Keno Fischer, co-founder and CTO of
  89     [15]Julia  Computing , were kind enough to share their thoughts
  90     with  us  about the language. I'll be drawing on their comments
  91     later  on;  for now, let's get a taste of what Julia code looks
  92     like.
  93
  94     Getting started
  95
  96     To   explore   Julia   initially,   start   up   its   standard
  97     [16]read-eval-print   loop   (REPL)  by  typing  julia  at  the
  98     terminal,  assuming  that  you have installed it. You will then
  99     be  able  to  interact with what will seem to be an interpreted
 100     language  —  but,  behind  the scenes, those commands are being
 101     compiled  by  a  just-in-time  (JIT)  compiler  that  uses  the
 102     [17]LLVM   compiler   framework  .  This  allows  Julia  to  be
 103     interactive,  while  turning the code into fast, native machine
 104     instructions.   However,  the  JIT  compiler  passes  sometimes
 105     introduce  noticeable delays at the REPL, especially when using
 106     a function for the first time.
 107
 108     To  run  a  Julia  program non-interactively, execute a command
 109     like: $ julia script.jl
 110
 111     Julia  has  all  the  usual data structures: numbers of various
 112     types     (including    complex    and    rational    numbers),
 113     multidimensional    arrays,    dictionaries,    strings,    and
 114     characters.  Functions  are  first-class: they can be passed as
 115     arguments  to other functions, can be members of arrays, and so
 116     on.
 117
 118     Julia  embraces  Unicode. Strings, which are enclosed in double
 119     quotes,  are  arrays  of Unicode characters, which are enclosed
 120     in  single  quotes.  The  " * " operator is used for string and
 121     character  concatenation.  Thus 'a' and 'β' are characters, and
 122     'aβ'  is  a syntax error. "a" and "β" are strings, as are "aβ",
 123     'a' * 'β', and "a" * "β" — all evaluate to the same string.
 124
 125     Variable  and  function names can contain non-ASCII characters.
 126     This,   along  with  Julia's  clever  syntax  that  understands
 127     numbers  prepended  to variables to mean multiplication, goes a
 128     long  way  to  allowing  the  numerical scientist to write code
 129     that  more  closely resembles the compact mathematical notation
 130     of the equations that usually lie behind it.  julia ε₁ = 0.01
 131
 132     0.01
 133
 134     julia ε₂ = 0.02
 135
 136     0.02
 137
 138     julia 2ε₁ + 3ε₂
 139
 140     0.08
 141
 142     And  where  does  Julia come down on the age-old debate of what
 143     do  about  1/2  ? In Fortran and Python 2, this will get you 0,
 144     since  1  and 2 are integers, and the result is rounded down to
 145     the  integer  0. This was deemed inconsistent, and confusing to
 146     some,  so  it  was changed in Python 3 to return 0.5 — which is
 147     what you get in Julia, too.
 148
 149     While  we're  on  the  subject  of  fractions, Julia can handle
 150     rational  numbers,  with  a special syntax: 3//5 + 2//3 returns
 151     19//15  ,  while  3/5  + 2/3 gets you the floating-point answer
 152     1.2666666666666666.  Internally,  Julia  thinks  of  a rational
 153     number  in  its  reduced  form,  so the expression 6//8 == 3//4
 154     returns true , and numerator(6//8) returns 3 .
 155
 156     Arrays
 157
 158     Arrays  are  enclosed  in  square  brackets and indexed with an
 159     iterator  that  can  contain a step value:  julia a = [1, 2, 3,
 160     4, 5, 6]
 161
 162     6-element Array{Int64,1}:
 163
 164     1
 165
 166     2
 167
 168     3
 169
 170     4
 171
 172     5
 173
 174     6
 175
 176     julia a[1:2:end]
 177
 178     3-element Array{Int64,1}:
 179
 180     1
 181
 182     3
 183
 184     5
 185
 186     As  you  can  see,  indexing  starts at one, and the useful end
 187     index  means  the  obvious thing. When you define a variable in
 188     the  REPL,  Julia  replies  with  the  type  and  value  of the
 189     assigned  data;  you  can  suppress  this output by ending your
 190     input line with a semicolon.
 191
 192     Since  arrays  are  such a vital part of numerical computation,
 193     and  Julia makes them easy to work with, we'll spend a bit more
 194     time with them than the other data structures.
 195
 196     To  illustrate  the  syntax,  we  can start with a couple of 2D
 197     arrays, defined at the REPL:  julia a = [1 2 3; 4 5 6]
 198
 199     2×3 Array{Int64,2}:
 200
 201     1 2 3
 202
 203     4 5 6
 204
 205     julia z = [-1 -2 -3; -4 -5 -6];
 206
 207     Indexing is as expected:  julia a[1, 2]
 208
 209     2
 210
 211     You can glue arrays together horizontally:  julia [a z]
 212
 213     2×6 Array{Int64,2}:
 214
 215     1 2 3 -1 -2 -3
 216
 217     4 5 6 -4 -5 -6
 218
 219     And vertically:  julia [a; z]
 220
 221     4×3 Array{Int64,2}:
 222
 223     1  2  3
 224
 225     4  5  6
 226
 227     -1 -2 -3
 228
 229     -4 -5 -6
 230
 231     Julia  has  all  the  usual  operators for handling arrays, and
 232     [18]linear  algebra  functions  that  work  with  matrices  (2D
 233     arrays).  The  linear  algebra  functions  are  part of Julia's
 234     standard  library,  but need to be imported with a command like
 235     "  using  LinearAlgebra  ",  which is a detail omitted from the
 236     current  documentation.  The  functions  include such things as
 237     determinants,  matrix  inverses,  eigenvalues and eigenvectors,
 238     many  kinds  of  matrix  factorizations,  etc.  Julia  has  not
 239     reinvented  the  wheel  here,  but  wisely  uses the [19]LAPACK
 240     Fortran library of battle-tested linear algebra routines.
 241
 242     The  extension  of  arithmetic  operators  to arrays is usually
 243     intuitive:  julia a + z
 244
 245     2×3 Array{Int64,2}:
 246
 247     0 0 0
 248
 249     0 0 0
 250
 251     And  the  numerical  prepending  syntax works with arrays, too:
 252     julia 3a + 4z
 253
 254     2×3 Array{Int64,2}:
 255
 256     -1 -2 -3
 257
 258     -4 -5 -6
 259
 260     Putting  a  multiplication  operator  between two matrices gets
 261     you matrix multiplication:  julia a * transpose(a)
 262
 263     2×2 Array{Int64,2}:
 264
 265     14 32
 266
 267     32 77
 268
 269     You  can  "broadcast"  numbers  to cover all the elements in an
 270     array  by prepending the usual arithmetic operators with a dot:
 271     julia 1 .+ a
 272
 273     2×3 Array{Int64,2}:
 274
 275     2 3 4
 276
 277     5 6 7
 278
 279     Note  that the language only actually requires the dot for some
 280     operators,  but  not  for  others,  such  as  "*"  and "/". The
 281     reasons  for this are arcane, and it probably makes sense to be
 282     consistent  and  use  the dot whenever you intend broadcasting.
 283     Note   also   that   the   current   version  of  the  official
 284     documentation  is  incorrect  in claiming that you may omit the
 285     dot from "+" and "-"; in fact, this now gives an error.
 286
 287     You  can  use  the  dot  notation to turn any function into one
 288     that   operates   on   each   element   of  an  array:    julia
 289     round.(sin.([0, π/2, π, 3π/2, 2π]))
 290
 291     5-element Array{Float64,1}:
 292
 293     0.0
 294
 295     1.0
 296
 297     0.0
 298
 299     -1.0
 300
 301     -0.0
 302
 303     The  example  above  illustrates  chaining two dotted functions
 304     together.  The  Julia compiler turns expressions like this into
 305     "fused"  operations:  instead of applying each function in turn
 306     to  create a new array that is passed to the next function, the
 307     compiler   combines   the  functions  into  a  single  compound
 308     function  that  is  applied  once  over  the  array, creating a
 309     significant optimization.
 310
 311     You  can  use  this  dot  notation with any function, including
 312     your  own, to turn it into a version that operates element-wise
 313     over arrays.
 314
 315     Dictionaries  (associative  arrays) can be defined with several
 316     syntaxes. Here's one:  julia d1 = Dict("A"=1, "B"=2)
 317
 318     Dict{String,Int64} with 2 entries:
 319
 320     "B" = 2
 321
 322     "A" = 1
 323
 324     You  may  have  noticed  that the code snippets so far have not
 325     included  any  type  declarations.  Every  value in Julia has a
 326     type,  but  the  compiler  will  infer  types  if  they are not
 327     specified.  It  is generally not necessary to declare types for
 328     performance,   but  type  declarations  sometimes  serve  other
 329     purposes,  that  we'll  return  to  later. Julia has a deep and
 330     sophisticated  type  system,  including  user-defined types and
 331     C-like  structs. Types can have behaviors associated with them,
 332     and  can  inherit  behaviors  from  other types. The best thing
 333     about  Julia's  type system is that you can ignore it entirely,
 334     use  just  a  few  pieces  of  it,  or spend weeks studying its
 335     design.
 336
 337     Control flow
 338
 339     Julia  code  is organized in blocks, which can indicate control
 340     flow,  function  definitions,  and other code units. Blocks are
 341     terminated  with  the  end  keyword,  and  indentation  is  not
 342     significant.  Statements  are separated either with newlines or
 343     semicolons.
 344
 345     Julia  has the typical control flow constructs; here is a while
 346     block:  julia i = 1;
 347
 348     julia while i 5
 349
 350     print(i)
 351
 352     global i = i + 1
 353
 354     end
 355
 356     1234
 357
 358     Notice  the  global  keyword.  Most blocks in Julia introduce a
 359     local  scope for variables; without this keyword here, we would
 360     get an error about an undefined variable.
 361
 362     Julia  has  the  usual if statements and for loops that use the
 363     same  iterators that we introduced above for array indexing. We
 364     can  also  iterate  over collections:  julia for i ∈ ['a', 'b',
 365     'c']
 366
 367     println(i)
 368
 369     end
 370
 371     a
 372
 373     b
 374
 375     c
 376
 377     In  place of the fancy math symbol in this for loop, we can use
 378     "  =  "  or " in ". If you want to use the math symbol but have
 379     no  convenient  way  to type it, the REPL will help you: type "
 380     \in  "  and  the  TAB key, and the symbol appears; you can type
 381     many [20]LaTeX expressions into the REPL in this way.
 382
 383     Development of Julia
 384
 385     The   language   is   developed   on   GitHub,  with  over  700
 386     contributors.  The  Julia  team  mentioned in their email to us
 387     that  the decision to use GitHub has been particularly good for
 388     Julia,  as  it  streamlined  the  process  for  many  of  their
 389     contributors,  who  are scientists or domain experts in various
 390     fields, rather than professional software developers.
 391
 392     The  creators  of  Julia  have  [21]published  [PDF] a detailed
 393     “mission  statement”  for  the  language, describing their aims
 394     and  motivations.  A  key issue that they wanted their language
 395     to  solve  is what they called the "two-language problem." This
 396     situation  is familiar to anyone who has used Python or another
 397     dynamic  language on a demanding numerical problem. To get good
 398     performance,   you  will  wind  up  rewriting  the  numerically
 399     intensive  parts  of  the program in C or Fortran, dealing with
 400     the  interface  between  the  two  languages,  and may still be
 401     disappointed  in  the overhead presented by calling the foreign
 402     routines from your original code.
 403
 404     For  Python,  [22]NumPy and SciPy wrap many numerical routines,
 405     written  in Fortran or C, for efficient use from that language,
 406     but  you  can  only  take advantage of this if your calculation
 407     fits  the  pattern  of  an  available  routine; in more general
 408     cases,  where you will have to write a loop over your data, you
 409     are  stuck with Python's native performance, which is orders of
 410     magnitude  slower.  If  you  switch  to  an alternative, faster
 411     implementation  of  Python,  such  as  [23]PyPy , the numerical
 412     libraries  may  not  be  compatible; NumPy became available for
 413     PyPy only within about the past year.
 414
 415     Julia  solves  the  two-language problem by being as expressive
 416     and  simple  to  program  in  as  a dynamic scripting language,
 417     while  having  the  native  performance  of  a static, compiled
 418     language.  There  is  no need to write numerical libraries in a
 419     second  language,  but  C  or  Fortran  library routines can be
 420     called   using  a  facility  that  Julia  has  built-in.  Other
 421     languages,  such as [24]Python or [25]R , can also interoperate
 422     easily with Julia using external packages.
 423
 424     Documentation
 425
 426     There  are  many  resources  to  turn to to learn the language.
 427     There   is  an  extensive  and  detailed  [26]manual  at  Julia
 428     headquarters,  and  this may be a good place to start. However,
 429     although  the first few chapters provide a gentle introduction,
 430     the  material soon becomes dense and, at times, hard to follow,
 431     with  references to concepts that are not explained until later
 432     chapters.  Fortunately,  there  is a [27]"learning" link at the
 433     top  of  the Julia home page, which takes you to a long list of
 434     videos,  tutorials,  books,  articles,  and  classes both about
 435     Julia  and that use Julia in teaching subjects such a numerical
 436     analysis.  There  is also a fairly good [28]cheat-sheet [PDF] ,
 437     which was just updated for v. 1.0.
 438
 439     If  you're  coming  from  Python,  [29]this  list of noteworthy
 440     differences  between  Python  and Julia syntax will probably be
 441     useful.
 442
 443     Some  of  the  linked  tutorials are in the form of [30]Jupyter
 444     notebooks  — indeed, the name "Jupyter" is formed from "Julia",
 445     "Python",  and  "R",  which  are  the  three original languages
 446     supported  by  the  interface. The [31]Julia kernel for Jupyter
 447     was  recently upgraded to support v. 1.0. Judicious sampling of
 448     a  variety  of  documentation  sources,  combined  with liberal
 449     experimentation,  may be the best way of learning the language.
 450     Jupyter  makes this experimentation more inviting for those who
 451     enjoy  the  web-based  interface,  but the REPL that comes with
 452     Julia  helps  a  great  deal  in  this regard by providing, for
 453     instance,  TAB  completion and an extensive help system invoked
 454     by simply pressing the "?" key.
 455
 456     Stay tuned
 457
 458     The  [32]next  installment in this two-part series will explain
 459     how   Julia  is  organized  around  the  concept  of  "multiple
 460     dispatch".  You  will  learn  how  to create functions and make
 461     elementary  use  of  Julia's  type  system.  We'll  see  how to
 462     install  packages  and  use  modules,  and  how to make graphs.
 463     Finally,  Part  2  will  briefly survey the important topics of
 464     macros and distributed computing.
 465
 466     [33]Comments (80 posted)
 467
 468     [34]C considered dangerous
 469
 470     By Jake Edge
 471
 472     August 29, 2018
 473
 474     [35]LSS NA
 475
 476     At  the  North  America  edition of the [36]2018 Linux Security
 477     Summit  (LSS  NA),  which was held in late August in Vancouver,
 478     Canada,  Kees  Cook  gave a presentation on some of the dangers
 479     that  come  with  programs  written  in  C.  In  particular, of
 480     course,  the  Linux  kernel is mostly written in C, which means
 481     that  the security of our systems rests on a somewhat dangerous
 482     foundation.  But there are things that can be done to help firm
 483     things  up  by  " Making C Less Dangerous " as the title of his
 484     talk suggested.
 485
 486     He  began  with  a brief summary of the work that he and others
 487     are  doing  as  part  of the [37]Kernel Self Protection Project
 488     (KSPP).  The  goal  of the project is to get kernel protections
 489     merged  into  the  mainline. These protections are not targeted
 490     at  protecting user-space processes from other (possibly rogue)
 491     processes,  but  are, instead, focused on protecting the kernel
 492     from  user-space  code.  There  are around 12 organizations and
 493     ten  individuals  working  on roughly 20 different technologies
 494     as  part  of the KSPP, he said. The progress has been "slow and
 495     steady", he said, which is how he thinks it should go.  [38]
 496
 497     One  of  the  main  problems is that C is treated mostly like a
 498     fancy  assembler.  The  kernel  developers do this because they
 499     want  the  kernel to be as fast and as small as possible. There
 500     are   other   reasons,   too,   such   as   the   need   to  do
 501     architecture-specific  tasks that lack a C API (e.g. setting up
 502     page tables, switching to 64-bit mode).
 503
 504     But   there   is   lots   of  undefined  behavior  in  C.  This
 505     "operational   baggage"   can  lead  to  various  problems.  In
 506     addition,  C  has a weak standard library with multiple utility
 507     functions  that  have  various  pitfalls.  In C, the content of
 508     uninitialized  automatic  variables  is  undefined,  but in the
 509     machine  code that it gets translated to, the value is whatever
 510     happened  to  be  in  that  memory  location  before.  In  C, a
 511     function  pointer can be called even if the type of the pointer
 512     does  not  match the type of the function being called—assembly
 513     doesn't care, it just jumps to a location, he said.
 514
 515     The  APIs  in  the standard library are also bad in many cases.
 516     He  asked:  why is there no argument to memcpy() to specify the
 517     maximum  destination  length?  He  noted a recent [39]blog post
 518     from  Raph  Levien  entitled "With Undefined Behavior, Anything
 519     is  Possible".  That  obviously  resonated  with  Cook,  as  he
 520     pointed  out  his  T-shirt—with  the title and artwork from the
 521     post.
 522
 523     Less danger
 524
 525     He  then  moved on to some things that kernel developers can do
 526     (and  are  doing) to get away from some of the dangers of C. He
 527     began  with variable-length arrays (VLAs), which can be used to
 528     overflow  the  stack to access data outside of its region. Even
 529     if  the  stack  has a guard page, VLAs can be used to jump past
 530     it  to  write into other memory, which can then be used by some
 531     other  kind  of  attack. The C language is "perfectly fine with
 532     this".  It  is  easy  to find uses of VLAs with the -Wvla flag,
 533     however.
 534
 535     But  it  turns  out  that  VLAs  are  [40]not  just  bad from a
 536     security   perspective   ,   they   are   also   slow.   In   a
 537     micro-benchmark  associated with a [41]patch removing a VLA , a
 538     13%  performance  boost  came from using a fixed-size array. He
 539     dug  in  a  bit  further and found that much more code is being
 540     generated  to  handle a VLA, which explains the speed increase.
 541     Since  Linus  Torvalds  has  [42]declared  that  VLAs should be
 542     removed  from  the  kernel because they cause security problems
 543     and also slow the kernel down; Cook said "don't use VLAs".
 544
 545     Another  problem area is switch statements, in particular where
 546     there  is  no  break  for  a  case  .  That could mean that the
 547     programmer  expects  and wants to fall through to the next case
 548     or  it could be that the break was simply forgotten. There is a
 549     way  to  get a warning from the compiler for fall-throughs, but
 550     there  needs  to be a way to mark those that are truly meant to
 551     be  that way. A special fall-through "statement" in the form of
 552     a   comment   is   what   has   been   agreed   on  within  the
 553     static-analysis  community.  He  and  others  have  been  going
 554     through  each  of  the  places  where  there is no break to add
 555     these  comments  (or  a break ); they have "found a lot of bugs
 556     this way", he said.
 557
 558     Uninitialized  local variables will generate a warning, but not
 559     if  the  variable is passed in by reference. There are some GCC
 560     plugins  that  will  automatically  initialize these variables,
 561     but  there are also patches for both GCC and Clang to provide a
 562     compiler  option  to  do  so. Neither of those is upstream yet,
 563     but  Torvalds has praised the effort so the kernel would likely
 564     use  the  option.  An  interesting  side effect that came about
 565     while   investigating   this   was   a  warning  he  got  about
 566     unreachable  code  when  he  enabled  the  auto-initialization.
 567     There  were  two  variables  declared  just after a switch (and
 568     outside of any case ), where they would never be reached.
 569
 570     Arithmetic  overflow  is  another  undefined behavior in C that
 571     can  cause various problems. GCC can check for signed overflow,
 572     which  performs  well  (the overhead is in the noise, he said),
 573     but  adding warning messages for it does grow the kernel by 6%;
 574     making  the  overflow abort, instead, only adds 0.1%. Clang can
 575     check  for  both  signed and unsigned overflow; signed overflow
 576     is  undefined,  while  unsigned  overflow is defined, but often
 577     unexpected.  Marking places where unsigned overflow is expected
 578     is  needed;  it would be nice to get those annotations put into
 579     the kernel, Cook said.
 580
 581     Explicit   bounds   checking   is   expensive.   Doing  it  for
 582     copy_{to,from}_user()  is  a  less than 1% performance hit, but
 583     adding  it  to  the strcpy() and memcpy() families are around a
 584     2%  hit. Pre-Meltdown that would have been a totally impossible
 585     performance  regression  for  security, he said; post-Meltdown,
 586     since  it  is less than 5%, maybe there is a chance to add this
 587     checking.
 588
 589     Better  APIs would help as well. He pointed to the evolution of
 590     strcpy()  ,  through  str  n  cpy()  and str l cpy() (each with
 591     their  own bounds flaws) to str s cpy() , which seems to be "OK
 592     so  far".  He  also mentioned memcpy() again as a poor API with
 593     respect to bounds checking.
 594
 595     Hardware  support  for  bounds  checking  is  available  in the
 596     application  data  integrity  (ADI)  feature  for  SPARC and is
 597     coming  for  Arm; it may also be available for Intel processors
 598     at  some point. These all use a form of "memory tagging", where
 599     allocations  get a tag that is stored in the high-order byte of
 600     the  address.  An offset from the address can be checked by the
 601     hardware  to  see if it still falls within the allocated region
 602     based on the tag.
 603
 604     Control-flow  integrity  (CFI)  has  become  more  of  an issue
 605     lately  because much of what attackers had used in the past has
 606     been  marked  as  "no  execute"  so  they  are turning to using
 607     existing  code  "gadgets"  already  present  in  the  kernel by
 608     hijacking  existing indirect function calls. In C, you can just
 609     call  pointers  without  regard  to  the type as it just treats
 610     them  as  an  address  to  jump  to.  Clang  has a CFI-sanitize
 611     feature  that  enforces  the function prototype to restrict the
 612     calls  that  can  be  made.  It  is  done at runtime and is not
 613     perfect,  in  part  because  there are lots of functions in the
 614     kernel  that  take  one  unsigned  long parameter and return an
 615     unsigned long.
 616
 617     Attacks  on  CFI  have both a "forward edge", which is what CFI
 618     sanitize  tries  to  handle,  and  a "backward edge" that comes
 619     from  manipulating  the  stack  values,  the  return address in
 620     particular.  Clang  has  two  methods  available to prevent the
 621     stack  manipulation.  The first is the "safe stack", which puts
 622     various   important  items  (e.g.  "safe"  variables,  register
 623     spills,   and   the   return  address)  on  a  separate  stack.
 624     Alternatively,  the  "shadow  stack" feature creates a separate
 625     stack just for return addresses.
 626
 627     One  problem  with  these  other  stacks is that they are still
 628     writable,  so  if an attacker can find them in memory, they can
 629     still  perform  their attacks. Hardware-based protections, like
 630     Intel's     Control-Flow    Enforcement    Technology    (CET),
 631     [43]provides   a   read-only   shadow  call  stack  for  return
 632     addresses.   Another   hardware   protection   is   [44]pointer
 633     authentication  for  Arm, which adds a kind of encrypted tag to
 634     the return address that can be verified before it is used.
 635
 636     Status and challenges
 637
 638     Cook  then  went  through  the current status of handling these
 639     different  problems  in  the kernel. VLAs are almost completely
 640     gone,  he  said,  just a few remain in the crypto subsystem; he
 641     hopes  those  VLAs will be gone by 4.20 (or whatever the number
 642     of  the  next  kernel  release  turns  out  to  be).  Once that
 643     happens,  he  plans  to  turn  on -Wvla for the kernel build so
 644     that none creep back in.
 645
 646     There  has  been  steady  progress made on marking fall-through
 647     cases  in  switch  statements. Only 745 remain to be handled of
 648     the  2311  that  existed  when  this  work  started;  each  one
 649     requires  scrutiny  to  determine  what the author's intent is.
 650     Auto-initialized  local  variables  can  be done using compiler
 651     plugins,  but  that  is "not quite what we want", he said. More
 652     compiler   support  would  be  helpful  there.  For  arithmetic
 653     overflow,  it  would  be  nice  to  see GCC get support for the
 654     unsigned  case,  but  memory allocations are now doing explicit
 655     overflow checking at this point.
 656
 657     Bounds  checking has seen some "crying about performance hits",
 658     so  we  are  waiting impatiently for hardware support, he said.
 659     CFI  forward-edge  protection  needs [45]link-time optimization
 660     (LTO)  support  for  Clang  in  the kernel, but it is currently
 661     working  on  Android.  For  backward-edge mitigation, the Clang
 662     shadow   call   stack   is  working  on  Android,  but  we  are
 663     impatiently waiting for hardware support for that too.
 664
 665     There  are a number of challenges in doing security development
 666     for  the  kernel,  Cook said. There are cultural boundaries due
 667     to  conservatism  within  the  kernel  community; that requires
 668     patiently  working  and reworking features in order to get them
 669     upstream.  There  are,  of course, technical challenges because
 670     of  the complexity of security changes; those kinds of problems
 671     can  be solved. There are also resource limitations in terms of
 672     developers,  testers,  reviewers, and so on. KSPP and the other
 673     kernel  security  developers  are  still  making that "slow but
 674     steady" progress.
 675
 676     Cook's  [46]slides  [PDF] are available for interested readers;
 677     before  long,  there should be a video available of the talk as
 678     well.
 679
 680     [I  would  like  to  thank  LWN's  travel  sponsor,  the  Linux
 681     Foundation,  for travel assistance to attend the Linux Security
 682     Summit in Vancouver.]
 683
 684     [47]Comments (70 posted)
 685
 686     [48]The second half of the 4.19 merge window
 687
 688     By Jonathan Corbet
 689
 690     August  26,  2018    By  the  time  Linus Torvalds [49]released
 691     4.19-rc1  and  closed  the  merge  window  for this development
 692     cycle,  12,317  non-merge  changesets  had found their way into
 693     the  mainline;  about  4,800  of  those  landed  after [50]last
 694     week's  summary  was  written.  As tends to be the case late in
 695     the  merge  window,  many  of  those changes were fixes for the
 696     bigger  patches  that  went  in  early,  but  there were also a
 697     number  of  new  features  added.  Some of the more significant
 698     changes include:
 699
 700     Core kernel
 701
 702     The  full  set of patches adding [51]control-group awareness to
 703     the  out-of-memory  killer  has  not been merged due to ongoing
 704     disagreements,  but  one  piece  of  it  has:  there  is  a new
 705     memory.oom.group  control  knob  that  will cause all processes
 706     within  a  control  group  to  be  killed  in  an out-of-memory
 707     situation.
 708
 709     A  new set of protections has been added to prevent an attacker
 710     from  fooling  a  program  into  writing to an existing file or
 711     FIFO.  An  open  with  the  O_CREAT flag to a file or FIFO in a
 712     world-writable,  sticky directory (e.g. /tmp ) will fail if the
 713     owner  of  the  opening  process is not the owner of either the
 714     target   file  or  the  containing  directory.  This  behavior,
 715     disabled    by    default,    is    controlled   by   the   new
 716     protected_regular and protected_fifos sysctl knobs.
 717
 718     Filesystems and block layer
 719
 720     The  dm-integrity  device-mapper  target can now use a separate
 721     device for metadata storage.
 722
 723     EROFS,  the  "enhanced read-only filesystem", has been added to
 724     the  staging  tree. It is " a lightweight read-only file system
 725     with    modern   designs   (eg.   page-sized   blocks,   inline
 726     xattrs/data,  etc.)  for  scenarios which need high-performance
 727     read-only  requirements,  eg.  firmwares  in  mobile  phone  or
 728     LIVECDs "
 729
 730     The  new  "metadata  copy-up"  feature  in overlayfs will avoid
 731     copying   a   file's   contents   to   the  upper  layer  on  a
 732     metadata-only change. See [52]this commit for details.
 733
 734     Hardware support
 735
 736     Graphics : Qualcomm Adreno A6xx GPUs.
 737
 738     Industrial    I/O    :    Spreadtrum    SC27xx    series   PMIC
 739     analog-to-digital    converters,    Analog    Devices    AD5758
 740     digital-to-analog  converters, Intersil ISL29501 time-of-flight
 741     sensors,  Silicon  Labs  SI1133  UV  index/ambient light sensor
 742     chips, and Bosch Sensortec BME680 sensors.
 743
 744     Miscellaneous   :  Generic  ADC-based  resistive  touchscreens,
 745     Generic  ASIC  devices  via  the  Google [53]Gasket framework ,
 746     Analog  Devices  ADGS1408/ADGS1409  multiplexers,  Actions Semi
 747     Owl  SoCs  DMA  controllers,  MEN  16Z069 watchdog timers, Rohm
 748     BU21029   touchscreen   controllers,   Cirrus   Logic  CS47L35,
 749     CS47L85,  CS47L90,  and  CS47L91  codecs,  Cougar  500k  gaming
 750     keyboards,   Qualcomm   GENI-based   I2C  controllers,  Actions
 751     Semiconductor  Owl  I2C  controllers,  ChromeOS  EC-based USBPD
 752     chargers, and Analog Devices ADP5061 battery chargers.
 753
 754     USB  :  Nuvoton  NPCM7XX on-chip EHCI USB controllers, Broadcom
 755     Stingray PCIe PHYs, and Renesas R-Car generation 3 PCIe PHYs.
 756
 757     There  is  also  a  new  subsystem  for the abstraction of GNSS
 758     (global  navigation  satellite  systems  —  GPS,  for  example)
 759     receivers  in  the  kernel.  To  date,  such  devices have been
 760     handled  with  an  abundance of user-space drivers; the hope is
 761     to  bring  some  order  in  this  area.  Support for u-blox and
 762     SiRFstar receivers has been added as well.
 763
 764     Kernel internal
 765
 766     The  __deprecated  marker,  used to mark interfaces that should
 767     no  longer  be  used,  has been deprecated and removed from the
 768     kernel  entirely.  [54]Torvalds  said  : " They are not useful.
 769     They  annoy  everybody,  and  nobody  ever  does anything about
 770     them,  because  it's  always 'somebody elses problem'. And when
 771     people  start  thinking  that  warnings  are  normal, they stop
 772     looking  at  them, and the real warnings that mean something go
 773     unnoticed. "
 774
 775     The  minimum  version  of  GCC  required by the kernel has been
 776     moved up to 4.6.
 777
 778     There  are  a  couple of significant changes that failed to get
 779     in  this  time around, including the [55]XArray data structure.
 780     The  patches are thought to be ready, but they had the bad luck
 781     to  be  based  on  a  tree  that  failed to be merged for other
 782     reasons,  so  Torvalds  [56]didn't even look at them . That, in
 783     turn,   blocks  another  set  of  patches  intended  to  enable
 784     migration of slab-allocated objects.
 785
 786     The  other  big  deferral  is  the  [57]new system-call API for
 787     filesystem  mounting  . Despite ongoing [58]concerns about what
 788     happens  when  the  same  low-level  device is mounted multiple
 789     times  with  conflicting  options,  Al  Viro  sent  [59]a  pull
 790     request  to  send  this  work  upstream. The ensuing discussion
 791     made  it  clear  that  there  is  still not a consensus in this
 792     area,  though,  so  it  seems  that  this  work has to wait for
 793     another cycle.
 794
 795     Assuming  all  goes  well,  the  kernel will stabilize over the
 796     coming  weeks  and  the  final  4.19  release  will  happen  in
 797     mid-October.
 798
 799     [60]Comments (1 posted)
 800
 801     [61]Measuring (and fixing) I/O-controller throughput loss
 802
 803     August 29, 2018
 804
 805     This article was contributed by Paolo Valente
 806
 807     Many  services,  from  web hosting and video streaming to cloud
 808     storage,  need  to  move  data  to  and from storage. They also
 809     often  require  that  each  per-client I/O flow be guaranteed a
 810     non-zero   amount  of  bandwidth  and  a  bounded  latency.  An
 811     expensive  way to provide these guarantees is to over-provision
 812     storage  resources,  keeping  each  resource underutilized, and
 813     thus  have  plenty of bandwidth available for the few I/O flows
 814     dispatched  to  each  medium.  Alternatively one can use an I/O
 815     controller.  Linux provides two mechanisms designed to throttle
 816     some  I/O  streams  to allow others to meet their bandwidth and
 817     latency  requirements.  These mechanisms work, but they come at
 818     a  cost:  a  loss  of  as  much  as  80% of total available I/O
 819     bandwidth.  I  have run some tests to demonstrate this problem;
 820     some   upcoming  improvements  to  the  [62]bfq  I/O  scheduler
 821     promise to improve the situation considerably.
 822
 823     Throttling  does  guarantee control, even on drives that happen
 824     to  be highly utilized but, as will be seen, it has a hard time
 825     actually  ensuring  that  drives are highly utilized. Even with
 826     greedy  I/O  flows,  throttling  easily  ends  up  utilizing as
 827     little  as  20%  of the available speed of a flash-based drive.
 828     Such   a  speed  loss  may  be  particularly  problematic  with
 829     lower-end   storage.   On   the   opposite   end,  it  is  also
 830     disappointing  with  high-end  hardware, as the Linux block I/O
 831     stack  itself  has  been  [63]redesigned  from the ground up to
 832     fully  utilize  the  high  speed  of  modern,  fast storage. In
 833     addition,   throttling   fails   to   guarantee   the  expected
 834     bandwidths  if  I/O  contains  both  reads  and  writes,  or is
 835     sporadic in nature.
 836
 837     On  the  bright  side,  there  now  seems  to  be  an effective
 838     alternative  for controlling I/O: the proportional-share policy
 839     provided  by  the  bfq  I/O  scheduler.  It enables nearly 100%
 840     storage  bandwidth  utilization,  at  least  with  some  of the
 841     workloads  that  are  problematic  for  throttling. An upcoming
 842     version  of  bfq may be able to achieve this result with almost
 843     all  workloads.  Finally,  bfq  guarantees  bandwidths with all
 844     workloads.  The current limitation of bfq is that its execution
 845     overhead  becomes  significant  at  speeds  above  400,000  I/O
 846     operations per second on commodity CPUs.
 847
 848     Using  the  bfq  I/O  scheduler,  Linux  can  now guarantee low
 849     latency  to  lightweight  flows containing sporadic, short I/O.
 850     No  throughput  issues arise, and no configuration is required.
 851     This  capability benefits important, time-sensitive tasks, such
 852     as  video  or audio streaming, as well as executing commands or
 853     starting  applications.  Although  benchmarks are not available
 854     yet,  these  guarantees  might  also  be  provided by the newly
 855     proposed  [64]I/O latency controller . It allows administrators
 856     to  set target latencies for I/O requests originating from each
 857     group  of  processes,  and  favors  the  groups with the lowest
 858     target latency.
 859
 860     The testbed
 861
 862     I  ran  the  tests with an ext4 filesystem mounted on a PLEXTOR
 863     PX-256M5S  SSD,  which  features  a  peak rate of ~160MB/s with
 864     random  I/O,  and  of  ~500MB/s  with  sequential  I/O.  I used
 865     blk-mq,  in  Linux  4.18. The system was equipped with a 2.4GHz
 866     Intel  Core  i7-2760QM  CPU  and  1.3GHz  DDR3  DRAM. In such a
 867     system,  a  single  thread  doing  synchronous  reads reaches a
 868     throughput of 23MB/s.
 869
 870     For  the purposes of these tests, each process is considered to
 871     be  in  one of two groups, termed "target" and "interferers". A
 872     target  is  a  single-process,  I/O-bound  group  whose  I/O is
 873     focused  on.  In  particular,  I  measure  the  I/O  throughput
 874     enjoyed  by  this  group to get the minimum bandwidth delivered
 875     to  the group. An interferer is single-process group whose role
 876     is  to  generate additional I/O that interferes with the I/O of
 877     the  target.  The  tested  workloads  contain  one  target  and
 878     multiple interferers.
 879
 880     The  single  process  in  each  group  either  reads or writes,
 881     through  asynchronous  (buffered)  operations,  to  one  file —
 882     different  from the file read or written by any other process —
 883     after  invalidating  the  buffer cache for the file. I define a
 884     reader  or  writer  process as either "random" or "sequential",
 885     depending  on  whether  it  reads  or writes its file at random
 886     positions  or  sequentially.  Finally, an interferer is defined
 887     as  being either "active" or "inactive" depending on whether it
 888     performs  I/O during the test. When an interferer is mentioned,
 889     it is assumed that the interferer is active.
 890
 891     Workloads  are  defined  so as to try to cover the combinations
 892     that,  I believe, most influence the performance of the storage
 893     device  and of the I/O policies. For brevity, in this article I
 894     show results for only two groups of workloads:
 895
 896     Static  sequential  :  four  synchronous  sequential readers or
 897     four   asynchronous  sequential  writers,  plus  five  inactive
 898     interferers.
 899
 900     Static  random  :  four  synchronous random readers, all with a
 901     block size equal to 4k, plus five inactive interferers.
 902
 903     To  create  each  workload,  I  considered,  for  each  mix  of
 904     interferers  in the group, two possibilities for the target: it
 905     could  be  either  a random or a sequential synchronous reader.
 906     In  [65]a  longer version of this article [PDF] , you will also
 907     find   results  for  workloads  with  varying  degrees  of  I/O
 908     randomness,  and for dynamic workloads (containing sporadic I/O
 909     sources).  These extra results confirm the losses of throughput
 910     and I/O control for throttling that are shown here.
 911
 912     I/O policies
 913
 914     Linux  provides  two I/O-control mechanisms for guaranteeing (a
 915     minimum)  bandwidth, or at least fairness, to long-lived flows:
 916     the   throttling  and  proportional-share  I/O  policies.  With
 917     throttling,  one  can  set  a  maximum  bandwidth  limit — "max
 918     limit"  for brevity — for the I/O of each group. Max limits can
 919     be  used,  in an indirect way, to provide the service guarantee
 920     at  the  focus  of  this  article.  For  example,  to guarantee
 921     minimum  bandwidths  to  I/O flows, a group can be guaranteed a
 922     minimum  bandwidth by limiting the maximum bandwidth of all the
 923     other groups.
 924
 925     Unfortunately,  max  limits  have  two  drawbacks  in  terms of
 926     throughput.  First,  if  some groups do not use their allocated
 927     bandwidth,  that  bandwidth cannot be reclaimed by other active
 928     groups.  Second,  limits  must comply with the worst-case speed
 929     of  the  device,  namely, its random-I/O peak rate. Such limits
 930     will  clearly  leave  a lot of throughput unused with workloads
 931     that  otherwise  would  drive  the  device to higher throughput
 932     levels.  Maximizing  throughput  is  simply  not  a goal of max
 933     limits.  So,  for brevity, test results with max limits are not
 934     shown  here.  You  can find these results, plus a more detailed
 935     description  of  the  above  drawbacks,  in the long version of
 936     this article.
 937
 938     Because  of  these  drawbacks,  a  new, still experimental, low
 939     limit  has  been  added to the throttling policy. If a group is
 940     assigned  a low limit, then the throttling policy automatically
 941     limits  the  I/O of the other groups in such a way to guarantee
 942     to  the  group  a  minimum  bandwidth equal to its assigned low
 943     limit.  This  new  throttling  mechanism  throttles no group as
 944     long  as  every  group is getting at least its assigned minimum
 945     bandwidth.  I  tested  this mechanism, but did not consider the
 946     interesting  problem  of guaranteeing minimum bandwidths while,
 947     at the same time, enforcing maximum bandwidths.
 948
 949     The  other  I/O  policy available in Linux, proportional share,
 950     provides  weighted  fairness.  Each group is assigned a weight,
 951     and   should   receive   a  portion  of  the  total  throughput
 952     proportional  to  its  weight.  This  scheme guarantees minimum
 953     bandwidths  in  the  same way that low limits do in throttling.
 954     In  particular, it guarantees to each group a minimum bandwidth
 955     equal  to  the  ratio  between the weight of the group, and the
 956     sum  of the weights of all the groups that may be active at the
 957     same time.
 958
 959     The  actual implementation of the proportional-share policy, on
 960     a  given drive, depends on what flavor of the block layer is in
 961     use  for  that  drive.  If  the drive is using the legacy block
 962     interface,  the policy is implemented by the cfq I/O scheduler.
 963     Unfortunately,   cfq   fails   to   control   bandwidths   with
 964     flash-based  storage,  especially  on  drives featuring command
 965     queueing.  This  case  is  not  considered in these tests. With
 966     drives  using  the  multiqueue interface, proportional share is
 967     implemented  by  bfq. This is the combination considered in the
 968     tests.
 969
 970     To  benchmark  both  throttling  (low  limits) and proportional
 971     share,  I  tested,  for  each workload, the combinations of I/O
 972     policies  and  I/O  schedulers  reported in the table below. In
 973     the  end,  there  are  three  test  cases for each workload. In
 974     addition,  for some workloads, I considered two versions of bfq
 975     for the proportional-share policy.
 976
 977     Name
 978
 979     I/O policy
 980
 981     Scheduler
 982
 983     Parameter for target
 984
 985     Parameter for each of the four active interferers
 986
 987     Parameter for each of the five inactive interferers
 988
 989     Sum of parameters
 990
 991     low-none
 992
 993     Throttling with low limits
 994
 995     none
 996
 997     10MB/s
 998
 999     10MB/s (tot: 40)
1000
1001     20MB/s (tot: 100)
1002
1003     150MB/s
1004
1005     prop-bfq
1006
1007     Proportional share
1008
1009     bfq
1010
1011     300
1012
1013     100 (tot: 400)
1014
1015     200 (tot: 1000)
1016
1017     1700
1018
1019     For  low  limits,  I  report  results with only none as the I/O
1020     scheduler,  because  the  results  are  the same with kyber and
1021     mq-deadline.
1022
1023     The  capabilities of the storage medium and of low limits drove
1024     the policy configurations. In particular:
1025
1026     The  configuration  of the target and of the active interferers
1027     for  low-none  is  the one for which low-none provides its best
1028     possible  minimum-bandwidth  guarantee  to  the target: 10MB/s,
1029     guaranteed  if  all interferers are readers. Results remain the
1030     same  regardless of the values used for target latency and idle
1031     time;  I  set them to 100µs and 1000µs, respectively, for every
1032     group.
1033
1034     Low  limits  for  inactive  interferers  are  set  to twice the
1035     limits  for active interferers, to pose greater difficulties to
1036     the policy.
1037
1038     I  chose weights for prop-bfq so as to guarantee about the same
1039     minimum  bandwidth  as  low-none  to  the  target,  in the same
1040     only-reader  worst  case  as  for  low-none  and  to  preserve,
1041     between  the  weights  of  active and inactive interferers, the
1042     same  ratio  as  between  the low limits of active and inactive
1043     interferers.
1044
1045     Full  details  on  configurations  can  be  found  in  the long
1046     version of this article.
1047
1048     Each  workload  was  run  ten  times  for each policy, plus ten
1049     times   without  any  I/O  control,  i.e.,  with  none  as  I/O
1050     scheduler  and  no  I/O policy in use. For each run, I measured
1051     the  I/O  throughput of the target (which reveals the bandwidth
1052     provided  to  the target), the cumulative I/O throughput of the
1053     interferers,  and  the  total  I/O throughput. These quantities
1054     fluctuated  very  little  during  each  run,  as well as across
1055     different  runs. Thus in the graphs I report only averages over
1056     per-run  average throughputs. In particular, for the case of no
1057     I/O  control,  I  report only the total I/O throughput, to give
1058     an  idea of the throughput that can be reached without imposing
1059     any control.
1060
1061     Results
1062
1063     This  plot  shows  throughput results for the simplest group of
1064     workloads: the static-sequential set.
1065
1066     With  a  random reader as the target against sequential readers
1067     as  interferers,  low-none  does  guarantee  the configured low
1068     limit   to  the  target.  Yet  it  reaches  only  a  low  total
1069     throughput.  The  throughput  of  the  random  reader evidently
1070     oscillates  around 10MB/s during the test. This implies that it
1071     is  at least slightly below 10MB/s for a significant percentage
1072     of  the  time.  But  when this happens, the low-limit mechanism
1073     limits  the  maximum bandwidth of every active group to the low
1074     limit  set  for the group, i.e., to just 10MB/s. The end result
1075     is  a total throughput lower than 10% of the throughput reached
1076     without I/O control.
1077
1078     That  said, the high throughput achieved without I/O control is
1079     obtained  by  choking  the random I/O of the target in favor of
1080     the  sequential  I/O  of  the interferers. Thus, it is probably
1081     more  interesting  to  compare  low-none  throughput  with  the
1082     throughput  reachable while actually guaranteeing 10MB/s to the
1083     target.  The  target  is  a single, synchronous, random reader,
1084     which  reaches  23MB/s while active. So, to guarantee 10MB/s to
1085     the  target,  it  is  enough  to serve it for about half of the
1086     time,  and the interferers for the other half. Since the device
1087     reaches  ~500MB/s  with  the sequential I/O of the interferers,
1088     the  resulting  throughput  with  this  service scheme would be
1089     (500+23)/2,  or  about 260MB/s. low-none thus reaches less than
1090     20%  of  the total throughput that could be reached while still
1091     preserving the target bandwidth.
1092
1093     prop-bfq  provides the target with a slightly higher throughput
1094     than  low-none.  This  makes  it harder for prop-bfq to reach a
1095     high  total throughput, because prop-bfq serves more random I/O
1096     (from  the target) than low-none. Nevertheless, prop-bfq gets a
1097     much  higher  total  throughput than low-none. According to the
1098     above  estimate,  this  throughput  is about 90% of the maximum
1099     throughput  that  could  be reached, for this workload, without
1100     violating  service  guarantees. The reason for this good result
1101     is  that  bfq  provides  an  effective  implementation  of  the
1102     proportional-share  service  policy.  At  any time, each active
1103     group  is  granted  a fraction of the current total throughput,
1104     and  the  sum  of  these  fractions  is  equal to one; so group
1105     bandwidths  naturally  saturate  the available total throughput
1106     at all times.
1107
1108     Things  change  with  the  second  workload:  a  random  reader
1109     against  sequential writers. Now low-none reaches a much higher
1110     total  throughput  than  prop-bfq.  low-none  serves  much more
1111     sequential  (write)  I/O  than  prop-bfq because writes somehow
1112     break  the  low-limit  mechanisms and prevail over the reads of
1113     the  target.  Conceivably,  this happens because writes tend to
1114     both  starve  reads  in  the OS (mainly by eating all available
1115     I/O  tags)  and to cheat on their completion time in the drive.
1116     In  contrast,  bfq  is  intentionally  configured  to privilege
1117     reads, to counter these issues.
1118
1119     In  particular, low-none gets an even higher throughput than no
1120     I/O  control  at all because it penalizes the random I/O of the
1121     target even more than the no-controller configuration.
1122
1123     Finally,  with  the  last  two workloads, prop-bfq reaches even
1124     higher  total  throughput  than  with the first two. It happens
1125     because  the  target  also  does  sequential  I/O,  and serving
1126     sequential  I/O  is  much  more  beneficial for throughput than
1127     serving  random  I/O.  With  these  two  workloads,  the  total
1128     throughput  is, respectively, close to or much higher than that
1129     reached  without  I/O control. For the last workload, the total
1130     throughput  is  much higher because, differently from none, bfq
1131     privileges  reads  over  asynchronous writes, and reads yield a
1132     higher  throughput  than  writes.  In  contrast, low-none still
1133     gets  lower  or much lower throughput than prop-bfq, because of
1134     the  same issues that hinder low-none throughput with the first
1135     two workloads.
1136
1137     As  for  bandwidth  guarantees,  with  readers  as  interferers
1138     (third  workload),  prop-bfq,  as  expected, gives the target a
1139     fraction  of  the  total throughput proportional to its weight.
1140     bfq    approximates    perfect   proportional-share   bandwidth
1141     distribution  among groups doing I/O of the same type (reads or
1142     writes)  and  with  the  same  locality (sequential or random).
1143     With  the last workload, prop-bfq gives much more throughput to
1144     the  reader  than  to  all the interferers, because interferers
1145     are asynchronous writers, and bfq privileges reads.
1146
1147     The  second  group  of  workloads  (static random), is the one,
1148     among   all   the  workloads  considered,  for  which  prop-bfq
1149     performs worst. Results are shown below:
1150
1151     This  chart reports results not only for mainline bfq, but also
1152     for  an improved version of bfq which is currently under public
1153     testing.  As  can  be  seen, with only random readers, prop-bfq
1154     reaches  a  much  lower  total  throughput  than low-none. This
1155     happens  because of the Achilles heel of the bfq I/O scheduler.
1156     If  the  process  in  service  does  synchronous  I/O and has a
1157     higher  weight  than  some  other process, then, to give strong
1158     bandwidth   guarantees   to   that   process,   bfq  plugs  I/O
1159     dispatching  every  time  the process temporarily stops issuing
1160     I/O   requests.   In  this  respect,  processes  actually  have
1161     differentiated  weights and do synchronous I/O in the workloads
1162     tested.  So  bfq systematically performs I/O plugging for them.
1163     Unfortunately,  this  plugging  empties  the internal queues of
1164     the  drive, which kills throughput with random I/O. And the I/O
1165     of all processes in these workloads is also random.
1166
1167     The  situation  reverses  with  a  sequential reader as target.
1168     Yet,  the most interesting results come from the new version of
1169     bfq,  containing  small  changes  to  counter exactly the above
1170     weakness.  This  version  recovers  most of the throughput loss
1171     with  the  workload  made of only random I/O and more; with the
1172     second  workload,  where  the target is a sequential reader, it
1173     reaches about 3.7 times the total throughput of low-none.
1174
1175     When  the main concern is the latency of flows containing short
1176     I/O,  Linux seems now rather high performing, thanks to the bfq
1177     I/O  scheduler  and  the  I/O  latency  controller.  But if the
1178     requirement  is  to  provide  explicit bandwidth guarantees (or
1179     just  fairness) to I/O flows, then one must be ready to give up
1180     much  or most of the speed of the storage media. bfq helps with
1181     some   workloads,   but  loses  most  of  the  throughput  with
1182     workloads  consisting  of mostly random I/O. Fortunately, there
1183     is  apparently  hope  for  much  better  performance  since  an
1184     improvement,  still  under  development, seems to enable bfq to
1185     reach a high throughput with all workloads tested so far.
1186
1187     [  I  wish  to  thank  Vivek Goyal for enabling me to make this
1188     article much more fair and sound.]
1189
1190     [66]Comments (4 posted)
1191
1192     [67]KDE's onboarding initiative, one year later
1193
1194     August 24, 2018
1195
1196     This article was contributed by Marta Rybczyńska
1197
1198     [68]Akademy
1199
1200     In  2017,  the  KDE  community  decided  on  [69]three goals to
1201     concentrate  on  for  the  next  few  years.  One  of  them was
1202     [70]streamlining   the  onboarding  of  new  contributors  (the
1203     others  were  [71]improving usability and [72]privacy ). During
1204     [73]Akademy  ,  the  yearly  KDE  conference  that  was held in
1205     Vienna  in  August,  Neofytos Kolokotronis shared the status of
1206     the  onboarding  goal,  the work done during the last year, and
1207     further  plans.  While it is a complicated process in a project
1208     as  big  and  diverse  as  KDE, numerous improvements have been
1209     already made.
1210
1211     Two  of the three KDE community goals were proposed by relative
1212     newcomers.  Kolokotronis  was  one  of those, having joined the
1213     [74]KDE  Promo  team  not  long  before  proposing the focus on
1214     onboarding.  He  had  previously  been involved with [75]Chakra
1215     Linux  ,  a  distribution  based on KDE software. The fact that
1216     new  members of the community proposed strategic goals was also
1217     noted in the [76]Sunday keynote by Claudia Garad .
1218
1219     Proper  onboarding  adds excitement to the contribution process
1220     and  increases retention, he explained. When we look at [77]the
1221     definition  of  onboarding  ,  it is a process in which the new
1222     contributors  acquire  knowledge, skills, and behaviors so that
1223     they  can  contribute effectively. Kolokotronis proposed to see
1224     it  also  as  socialization:  integration  into  the  project's
1225     relationships, culture, structure, and procedures.
1226
1227     The  gains  from  proper  onboarding  are many. The project can
1228     grow   by  attracting  new  blood  with  new  perspectives  and
1229     solutions.   The  community  maintains  its  health  and  stays
1230     vibrant.  Another  important  advantage of efficient onboarding
1231     is  that  replacing  current  contributors  becomes easier when
1232     they  change interests, jobs, or leave the project for whatever
1233     reason.  Finally,  successful  onboarding adds new advocates to
1234     the project.
1235
1236     Achievements so far and future plans
1237
1238     The  team  started  with  ideas  for  a  centralized onboarding
1239     process  for the whole of KDE. They found out quickly that this
1240     would  not  work  because KDE is "very decentralized", so it is
1241     hard  to  provide  tools  and procedures that are going to work
1242     for   the  whole  project.  According  to  Kolokotronis,  other
1243     characteristics   of   KDE  that  impact  onboarding  are  high
1244     diversity,   remote   and   online   teams,   and  hundreds  of
1245     contributors  in dozens of projects and teams. In addition, new
1246     contributors  already know in which area they want to take part
1247     and  they  prefer  specific  information  that will be directly
1248     useful for them.
1249
1250     So  the  team  changed its approach; several changes have since
1251     been  proposed  and  implemented.  The  [78]Get  Involved page,
1252     which  is  expected to be one of the resources new contributors
1253     read  first, has been rewritten. For the [79]Junior Jobs page ,
1254     the  team  is  [80] [81]discussing what the generic content for
1255     KDE  as  a whole should be. The team simplified [82]Phabricator
1256     registration  ,  which  resulted  in  documenting  the  process
1257     better.  Another part of the work includes the [83]KDE Bugzilla
1258     ;  it  includes, for example initiatives to limit the number of
1259     states of a ticket or remove obsolete products.
1260
1261     The   [84]Plasma   Mobile  team  is  heavily  involved  in  the
1262     onboarding  goal.  The Plasma Mobile developers have simplified
1263     their    development   environment   setup   and   created   an
1264     [85]interactive  "Get  Involved"  page. In addition, the Plasma
1265     team  changed  the  way task descriptions are written; they now
1266     contain  more detail, so that it is easier to get involved. The
1267     basic  description  should  be  short  and clear, and it should
1268     include  details  of  the  problem  and possible solutions. The
1269     developers  try  to  share  the  list  of  skills  necessary to
1270     fulfill  the  tasks  and  include  clear links to the technical
1271     resources needed.
1272
1273     Kolokotronis  and  team  also identified a new potential source
1274     of  contributors  for  KDE:  distributions using KDE. They have
1275     the  advantage  of  already knowing and using the software. The
1276     next  idea  the team is working on is to make sure that setting
1277     up  a  development  environment is easy. The team plans to work
1278     on this during a dedicated sprint this autumn.
1279
1280     Searching for new contributors
1281
1282     Kolokotronis  plans  to  search  for  new  contributors  at the
1283     periphery  of  the  project,  among  the "skilled enthusiasts":
1284     loyal  users  who  actually  care  about the project. They "can
1285     make  wonders",  he  said.  Those  individuals may be also less
1286     confident  or  shy,  have  troubles  making the first step, and
1287     need  guidance.  The  project  leaders  should  take  that into
1288     account.
1289
1290     In   addition,   newcomers   are  all  different.  Kolokotronis
1291     provided  a  long  list  of  how contributors differ, including
1292     skills  and  knowledge,  motives  and  interests,  and time and
1293     dedication.  His  advice  is to "try to find their superpower",
1294     the  skills  they  have  that  are  missing  in the team. Those
1295     "superpowers" can then be used for the benefit of the project.
1296
1297     If  a project does nothing else, he said, it can start with its
1298     documentation.   However,   this   does   not  only  mean  code
1299     documentation.  Writing  down  the  procedures  or  information
1300     about  the internal work of the project, like who is working on
1301     what,  is  an  important  part of a project's documentation and
1302     helps  newcomers.  There  should  be  also guidelines on how to
1303     start, especially setting up the development environment.
1304
1305     The  first  thing  the  project leaders should do, according to
1306     Kolokotronis,  is to spend time on introducing newcomers to the
1307     project.  Ideally  every  new  contributor  should  be assigned
1308     mentors  —  more  experienced  members  who  can help them when
1309     needed.  The mentors and project leaders should find tasks that
1310     are   interesting   for  each  person.  Answering  an  audience
1311     question   on   suggestions   for   shy  new  contributors,  he
1312     recommended  even  more  mentoring.  It is also very helpful to
1313     make  sure  that  newcomers  have  enough  to  read, but "avoid
1314     RTFM",  he  highlighted.  It is also easy for a new contributor
1315     "to  fly  away",  he  said.  The solution is to keep requesting
1316     things and be proactive.
1317
1318     What the project can do?
1319
1320     Kolokotronis  suggested  a number of actions for a project when
1321     it   wants  to  improve  its  onboarding.  The  first  step  is
1322     preparation:  the  project  leaders  should know the team's and
1323     the  project's  needs. Long-term planning is important, too. It
1324     is  not  enough  to wait for contributors to come — the project
1325     should  be  proactive,  which means reaching out to candidates,
1326     suggesting   appropriate  tasks  and,  finally,  making  people
1327     available for the newcomers if they need help.
1328
1329     This  leads to next step: to be a mentor. Kolokotronis suggests
1330     being  a  "great  host",  but  also  trying  to  phase  out the
1331     dependency   on   the   mentor   rapidly.  "We  have  been  all
1332     newcomers",  he  said.  It  can  be  intimidating  to  join  an
1333     existing  group. Onboarding creates a sense of belonging which,
1334     in turn, increases retention.
1335
1336     The  last  step  proposed  was  to  be strategic. This includes
1337     thinking  about  the  emotions  you  want  newcomers  to  feel.
1338     Kolokotronis  explained the strategic part with an example. The
1339     overall   goal   is   (surprise!)  improve  onboarding  of  new
1340     contributors.  An  intermediate  objective might be to keep the
1341     newcomers  after  they  have  made  their first commit. If your
1342     strategy  is  to  keep  them  confident  and proud, you can use
1343     different  tactics  like  praise and acknowledgment of the work
1344     in  public.  Another  useful  tactic  may  be  assigning simple
1345     tasks, according to the skill of the contributor.
1346
1347     To   summarize,   the   most   important  thing,  according  to
1348     Kolokotronis,  is  to  respond  quickly and spend time with new
1349     contributors.  This  time should be used to explain procedures,
1350     and  to  introduce the people and culture. It is also essential
1351     to  guide  first  contributions  and praise contributor's skill
1352     and  effort. Increase the difficulty of tasks over time to keep
1353     contributors  motivated  and  challenged. And finally, he said,
1354     "turn them into mentors".
1355
1356     Kolokotronis  acknowledges  that  onboarding  "takes  time" and
1357     "everyone  complains"  about  it. However, he is convinced that
1358     it  is  beneficial  in  the  long  term  and  that it decreases
1359     developer turnover.
1360
1361     Advice to newcomers
1362
1363     Kolokotronis  concluded  with some suggestions for newcomers to
1364     a  project.  They  should  try  to be persistent and to not get
1365     discouraged  when  something  goes  wrong. Building connections
1366     from   the  very  beginning  is  helpful.  He  suggests  asking
1367     questions  as  if you were already a member "and things will be
1368     fine". However, accept criticism if it happens.
1369
1370     One  of  the  next  actions  of  the onboarding team will be to
1371     collect  feedback  from  newcomers and experienced contributors
1372     to  see  if they agree on the ideas and processes introduced so
1373     far.
1374
1375     [86]Comments (none posted)
1376
1377     [87]Sharing and archiving data sets with Dat
1378
1379     August 27, 2018
1380
1381     This article was contributed by Antoine Beaupré
1382
1383     [88]Dat  is  a  new peer-to-peer protocol that uses some of the
1384     concepts  of  [89]BitTorrent  and  Git.  Dat  primarily targets
1385     researchers  and  open-data activists as it is a great tool for
1386     sharing,  archiving, and cataloging large data sets. But it can
1387     also  be  used to implement decentralized web applications in a
1388     novel way.
1389
1390     Dat quick primer
1391
1392     Dat  is  written in JavaScript, so it can be installed with npm
1393     ,  but there are [90]standalone binary builds and a [91]desktop
1394     application  (as an AppImage). An [92]online viewer can be used
1395     to  inspect data for those who do not want to install arbitrary
1396     binaries on their computers.
1397
1398     The  command-line  application  allows  basic  operations  like
1399     downloading  existing  data sets and sharing your own. Dat uses
1400     a  32-byte hex string that is an [93]ed25519 public key , which
1401     is  is  used  to  discover  and  find  content  on the net. For
1402     example, this will download some sample data:  $ dat clone \
1403
1404     dat://778f8d955175c92e4ced5e4f5563f69bfec0c86cc6f670352c457943-
1405     666fe639 \
1406
1407     ~/Downloads/dat-demo
1408
1409     Similarly,  the  share  command  is  used  to share content. It
1410     indexes  the  files  in  a  given  directory  and creates a new
1411     unique  address  like the one above. The share command starts a
1412     server  that uses multiple discovery mechanisms (currently, the
1413     [94]Mainline  Distributed  Hash  Table  (DHT), a [95]custom DNS
1414     server  ,  and  multicast  DNS)  to announce the content to its
1415     peers.  This  is  how another user, armed with that public key,
1416     can  download  that  content with dat clone or mirror the files
1417     continuously with dat sync .
1418
1419     So  far,  this  looks  a  lot  like BitTorrent [96]magnet links
1420     updated  with 21st century cryptography. But Dat adds revisions
1421     on  top  of  that,  so  modifications  are automatically shared
1422     through  the  swarm.  That is important for public data sets as
1423     those  are  often  dynamic  in  nature.  Revisions also make it
1424     possible  to  use [97]Dat as a backup system by saving the data
1425     incrementally using an [98]archiver .
1426
1427     While  Dat  is designed to work on larger data sets, processing
1428     them  for  sharing  may  take a while. For example, sharing the
1429     Linux  kernel  source  code  required about five minutes as Dat
1430     worked  on indexing all of the files. This is comparable to the
1431     performance  offered by [99]IPFS and BitTorrent. Data sets with
1432     more or larger files may take quite a bit more time.
1433
1434     One  advantage  that  Dat  has  over  IPFS  is  that it doesn't
1435     duplicate  the  data. When IPFS imports new data, it duplicates
1436     the  files  into  ~/.ipfs . For collections of small files like
1437     the  kernel,  this  is not a huge problem, but for larger files
1438     like  videos  or  music,  it's  a  significant limitation. IPFS
1439     eventually  implemented  a solution to this [100]problem in the
1440     form  of the experimental [101]filestore feature , but it's not
1441     enabled  by  default.  Even  with that feature enabled, though,
1442     changes   to  data  sets  are  not  automatically  tracked.  In
1443     comparison,  Dat  operation on dynamic data feels much lighter.
1444     The downside is that each set needs its own dat share process.
1445
1446     Like  any  peer-to-peer  system, Dat needs at least one peer to
1447     stay  online  to  offer  the  content, which is impractical for
1448     mobile  devices. Hosting providers like [102]Hashbase (which is
1449     a  [103]pinning  service  in  Dat  jargon)  can help users keep
1450     content  online  without  running  their  own [104]server . The
1451     closest   parallel  in  the  traditional  web  ecosystem  would
1452     probably   be  content  distribution  networks  (CDN)  although
1453     pinning    services    are   not   necessarily   geographically
1454     distributed  and  a  CDN does not necessarily retain a complete
1455     copy of a website.  [105]
1456
1457     A  web  browser called [106]Beaker , based on the [107]Electron
1458     framework,  can  access  Dat  content  natively  without  going
1459     through  a pinning service. Furthermore, Beaker is essential to
1460     get   any   of  the  [108]Dat  applications  working,  as  they
1461     fundamentally  rely  on  dat://  URLs  to  do their magic. This
1462     means  that  Dat  applications won't work for most users unless
1463     they  install that special web browser. There is a [109]Firefox
1464     extension  called " [110]dat-fox " for people who don't want to
1465     install  yet  another  browser,  but  it  requires installing a
1466     [111]helper  program  .  The  extension  will  be  able to load
1467     dat://  URLs  but  many  applications  will still not work. For
1468     example,  the  [112]photo  gallery application completely fails
1469     with dat-fox.
1470
1471     Dat-based  applications  look promising from a privacy point of
1472     view.  Because of its peer-to-peer nature, users regain control
1473     over  where their data is stored: either on their own computer,
1474     an  online server, or by a trusted third party. But considering
1475     the  protocol  is not well established in current web browsers,
1476     I  foresee  difficulties  in adoption of that aspect of the Dat
1477     ecosystem.  Beyond  that,  it  is rather disappointing that Dat
1478     applications  cannot  run  natively in a web browser given that
1479     JavaScript is designed exactly for that.
1480
1481     Dat privacy
1482
1483     An  advantage  Dat  has  over other peer-to-peer protocols like
1484     BitTorrent   is   end-to-end   encryption.   I  was  originally
1485     concerned   by   the   encryption   design   when  reading  the
1486     [113]academic paper [PDF] :
1487
1488     It  is  up  to  client programs to make design decisions around
1489     which  discovery  networks  they  trust.  For  example if a Dat
1490     client  decides  to  use  the BitTorrent DHT to discover peers,
1491     and  they  are  searching for a publicly shared Dat key (e.g. a
1492     key  cited publicly in a published scientific paper) with known
1493     contents,  then because of the privacy design of the BitTorrent
1494     DHT  it  becomes  public  knowledge  what  key  that  client is
1495     searching for.
1496
1497     So  in  other  words, to share a secret file with another user,
1498     the  public key is transmitted over a secure side-channel, only
1499     to  then  leak  during  the discovery process. Fortunately, the
1500     public  Dat  key is not directly used during discovery as it is
1501     [114]hashed  with  BLAKE2B  .  Still, the security model of Dat
1502     assumes   the   public  key  is  private,  which  is  a  rather
1503     counterintuitive  concept  that  might upset cryptographers and
1504     confuse  users  who  are  frequently  encouraged  to  type such
1505     strings  in  address bars and search engines as part of the Dat
1506     experience.  There  is a [115]security & privacy FAQ in the Dat
1507     documentation warning about this problem:
1508
1509     One  of  the key elements of Dat privacy is that the public key
1510     is  never  used  in  any  discovery  network. The public key is
1511     hashed,  creating  the discovery key. Whenever peers attempt to
1512     connect to each other, they use the discovery key.
1513
1514     Data  is  encrypted  using  the  public key, so it is important
1515     that this key stays secure.
1516
1517     There  are  other  privacy  issues outlined in the document; it
1518     states that " Dat faces similar privacy risks as BitTorrent ":
1519
1520     When  you download a dataset, your IP address is exposed to the
1521     users  sharing  that dataset. This may lead to honeypot servers
1522     collecting  IP addresses, as we've seen in Bittorrent. However,
1523     with  dataset  sharing we can create a web of trust model where
1524     specific  institutions  are  trusted  as  primary  sources  for
1525     datasets, diminishing the sharing of IP addresses.
1526
1527     A  Dat  blog  post  refers to this issue as [116]reader privacy
1528     and  it is, indeed, a sensitive issue in peer-to-peer networks.
1529     It  is  how  BitTorrent  users  are discovered and served scary
1530     verbiage  from  lawyers, after all. But Dat makes this a little
1531     better  because,  to  join  a swarm, you must know what you are
1532     looking  for  already,  which means peers who can look at swarm
1533     activity  only  include  users  who know the secret public key.
1534     This  works  well  for  secret  content, but for larger, public
1535     data  sets, it is a real problem; it is why the Dat project has
1536     [117]avoided creating a Wikipedia mirror so far.
1537
1538     I  found  another  privacy  issue that is not documented in the
1539     security  FAQ  during  my  review of the protocol. As mentioned
1540     earlier,  the [118]Dat discovery protocol routinely phones home
1541     to  DNS  servers operated by the Dat project. This implies that
1542     the  default  discovery  servers (and an attacker watching over
1543     their  traffic)  know  who is publishing or seeking content, in
1544     essence  discovering  the  "social  network"  behind  Dat. This
1545     discovery  mechanism  can be disabled in clients, but a similar
1546     privacy  issue  applies  to  the  DHT as well, although that is
1547     distributed  so  it  doesn't  require  trust of the Dat project
1548     itself.
1549
1550     Considering  those  aspects  of the protocol, privacy-conscious
1551     users  will  probably  want  to  use Tor or other anonymization
1552     techniques to work around those concerns.
1553
1554     The future of Dat
1555
1556     [119]Dat  2.0  was  released  in  June  2017  with  performance
1557     improvements   and   protocol   changes.  [120]Dat  Enhancement
1558     Proposals  (DEPs)  guide the project's future development; most
1559     work  is  currently  geared  toward  implementing  the  draft "
1560     [121]multi-writer   proposal   "   in  [122]HyperDB  .  Without
1561     multi-writer  support, only the original publisher of a Dat can
1562     modify  it.  According  to  Joe  Hand, co-executive-director of
1563     [123]Code  for  Science & Society (CSS) and Dat core developer,
1564     in  an  IRC  chat, "supporting multiwriter is a big requirement
1565     for  lots  of  folks". For example, while Dat might allow Alice
1566     to  share  her  research  results with Bob, he cannot modify or
1567     contribute  back  to  those results. The multi-writer extension
1568     allows  for  Alice  to assign trust to Bob so he can have write
1569     access to the data.
1570
1571     Unfortunately,  the  current  proposal doesn't solve the " hard
1572     problems  " of " conflict merges and secure key distribution ".
1573     The  former  will  be worked out through user interface tweaks,
1574     but  the  latter  is  a  classic problem that security projects
1575     have   typically   trouble  finding  solutions  for—Dat  is  no
1576     exception.  How  will Alice securely trust Bob? The OpenPGP web
1577     of  trust?  Hexadecimal  fingerprints  read over the phone? Dat
1578     doesn't provide a magic solution to this problem.
1579
1580     Another  thing limiting adoption is that Dat is not packaged in
1581     any  distribution  that I could find (although I [124]requested
1582     it  in  Debian  )  and,  considering the speed of change of the
1583     JavaScript  ecosystem,  this  is  unlikely  to  change any time
1584     soon.  A  [125]Rust  implementation  of  the  Dat  protocol has
1585     started,  however,  which  might  be easier to package than the
1586     multitude  of  [126]Node.js  modules. In terms of mobile device
1587     support,  there is an experimental Android web browser with Dat
1588     support  called  [127]Bunsen  , which somehow doesn't run on my
1589     phone.  Some  adventurous  users  have  successfully run Dat in
1590     [128]Termux  .  I  haven't  found an app running on iOS at this
1591     point.
1592
1593     Even  beyond  platform  support, distributed protocols like Dat
1594     have  a  tough  slope  to climb against the virtual monopoly of
1595     more  centralized  protocols,  so  it  remains  to  be seen how
1596     popular  those  tools  will  be.  Hand says Dat is supported by
1597     multiple  non-profit  organizations. Beyond CSS, [129]Blue Link
1598     Labs  is working on the Beaker Browser as a self-funded startup
1599     and  a  grass-roots  organization, [130]Digital Democracy , has
1600     contributed  to  the  project.  The  [131]Internet  Archive has
1601     [132]announced  a  collaboration  between  itself, CSS, and the
1602     California  Digital  Library to launch a pilot project to see "
1603     how   members  of  a  cooperative,  decentralized  network  can
1604     leverage  shared  services  to  ensure  data preservation while
1605     reducing storage costs and increasing replication counts ".
1606
1607     Hand  said  adoption in academia has been "slow but steady" and
1608     that  the [133]Dat in the Lab project has helped identify areas
1609     that  could  help researchers adopt the project. Unfortunately,
1610     as  is  the case with many free-software projects, he said that
1611     "our  team is definitely a bit limited on bandwidth to push for
1612     bigger  adoption".  Hand said that the project received a grant
1613     from   [134]Mozilla   Open   Source   Support  to  improve  its
1614     documentation, which will be a big help.
1615
1616     Ultimately,   Dat   suffers   from  a  problem  common  to  all
1617     peer-to-peer  applications,  which is naming. Dat addresses are
1618     not  exactly  intuitive:  humans  do not remember strings of 64
1619     hexadecimal  characters well. For this, Dat took a [135]similar
1620     approach  to IPFS by using DNS TXT records and /.well-known URL
1621     paths   to  bridge  existing,  human-readable  names  with  Dat
1622     hashes.  So  this sacrifices a part of the decentralized nature
1623     of the project in favor of usability.
1624
1625     I  have  tested  a lot of distributed protocols like Dat in the
1626     past  and I am not sure Dat is a clear winner. It certainly has
1627     advantages  over IPFS in terms of usability and resource usage,
1628     but  the  lack  of packages on most platforms is a big limit to
1629     adoption  for  most  people. This means it will be difficult to
1630     share  content  with  my  friends  and  family with Dat anytime
1631     soon,  which  would  probably  be  my  primary use case for the
1632     project.  Until  the  protocol  reaches the wider adoption that
1633     BitTorrent  has  seen  in  terms  of  platform  support, I will
1634     probably   wait   before  switching  everything  over  to  this
1635     promising project.
1636
1637     [136]Comments (11 posted)
1638
1639     Page editor : Jonathan Corbet
1640
1641     Inside this week's LWN.net Weekly Edition
1642
1643     [137]Briefs  :  OpenSSH  7.8;  4.19-rc1;  Which stable?; Netdev
1644     0x12; Bison 3.1; Quotes; ...
1645
1646     [138]Announcements  :  Newsletters;  events;  security updates;
1647     kernel patches; ...  Next page : [139]Brief items>>
1648
1649
1650
1651     [1] https://lwn.net/Articles/763743/
1652
1653     [2] https://lwn.net/Articles/763626/
1654
1655     [3] https://lwn.net/Articles/763641/
1656
1657     [4] https://lwn.net/Articles/763106/
1658
1659     [5] https://lwn.net/Articles/763603/
1660
1661     [6] https://lwn.net/Articles/763175/
1662
1663     [7] https://lwn.net/Articles/763492/
1664
1665     [8] https://lwn.net/Articles/763254/
1666
1667     [9] https://lwn.net/Articles/763255/
1668
1669     [10] https://lwn.net/Articles/763743/#Comments
1670
1671     [11] https://lwn.net/Articles/763626/
1672
1673     [12] http://julialang.org/
1674
1675     [13] https://julialang.org/blog/2018/08/one-point-zero
1676
1677     [14] https://julialang.org/benchmarks/
1678
1679     [15] https://juliacomputing.com/
1680
1681     [16] https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93p-
1682     rint_loop
1683
1684     [17] http://llvm.org/
1685
1686     [18] http://www.3blue1brown.com/essence-of-linear-algebra-page/
1687
1688     [19] http://www.netlib.org/lapack/
1689
1690     [20] https://lwn.net/Articles/657157/
1691
1692     [21] https://julialang.org/publications/julia-fresh-approach-B-
1693     EKS.pdf
1694
1695     [22] https://lwn.net/Articles/738915/
1696
1697     [23] https://pypy.org/
1698
1699     [24] https://github.com/JuliaPy/PyCall.jl
1700
1701     [25] https://github.com/JuliaInterop/RCall.jl
1702
1703     [26] https://docs.julialang.org/en/stable/
1704
1705     [27] https://julialang.org/learning/
1706
1707     [28] http://bogumilkaminski.pl/files/julia_express.pdf
1708
1709     [29] https://docs.julialang.org/en/stable/manual/noteworthy-di-
1710     fferences/#Noteworthy-differences-from-Python-1
1711
1712     [30] https://lwn.net/Articles/746386/
1713
1714     [31] https://github.com/JuliaLang/IJulia.jl
1715
1716     [32] https://lwn.net/Articles/764001/
1717
1718     [33] https://lwn.net/Articles/763626/#Comments
1719
1720     [34] https://lwn.net/Articles/763641/
1721
1722     [35] https://lwn.net/Archives/ConferenceByYear/#2018-Linux_Sec-
1723     urity_Summit_NA
1724
1725     [36]  https://events.linuxfoundation.org/events/linux-security-
1726     summit-north-america-2018/
1727
1728     [37] https://kernsec.org/wiki/index.php/Kernel_Self_Protection-
1729     _Project
1730
1731     [38] https://lwn.net/Articles/763644/
1732
1733     [39] https://raphlinus.github.io/programming/rust/2018/08/17/u-
1734     ndefined-behavior.html
1735
1736     [40] https://lwn.net/Articles/749064/
1737
1738     [41] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/-
1739     linux.git/commit/?id=02361bc77888
1740
1741     [42] https://lore.kernel.org/lkml/CA+55aFzCG-zNmZwX4A2FQpadafL-
1742     fEzK6CC=qPXydAacU1RqZWA@mail.gmail.com/T/#u
1743
1744     [43] https://lwn.net/Articles/758245/
1745
1746     [44] https://lwn.net/Articles/718888/
1747
1748     [45] https://lwn.net/Articles/744507/
1749
1750     [46] https://outflux.net/slides/2018/lss/danger.pdf
1751
1752     [47] https://lwn.net/Articles/763641/#Comments
1753
1754     [48] https://lwn.net/Articles/763106/
1755
1756     [49] https://lwn.net/Articles/763497/
1757
1758     [50] https://lwn.net/Articles/762566/
1759
1760     [51] https://lwn.net/Articles/761118/
1761
1762     [52] https://git.kernel.org/linus/d5791044d2e5749ef4de84161cec-
1763     5532e2111540
1764
1765     [53] https://lwn.net/ml/linux-kernel/20180630000253.70103-1-sq-
1766     ue@chromium.org/
1767
1768     [54] https://git.kernel.org/linus/771c035372a036f83353eef46dbb-
1769     829780330234
1770
1771     [55] https://lwn.net/Articles/745073/
1772
1773     [56] https://lwn.net/ml/linux-kernel/CA+55aFxFjAmrFpwQmEHCthHO-
1774     zgidCKnod+cNDEE+3Spu9o1s3w@mail.gmail.com/
1775
1776     [57] https://lwn.net/Articles/759499/
1777
1778     [58] https://lwn.net/Articles/762355/
1779
1780     [59] https://lwn.net/ml/linux-fsdevel/20180823223145.GK6515@Ze-
1781     nIV.linux.org.uk/
1782
1783     [60] https://lwn.net/Articles/763106/#Comments
1784
1785     [61] https://lwn.net/Articles/763603/
1786
1787     [62] https://lwn.net/Articles/601799/
1788
1789     [63] https://lwn.net/Articles/552904
1790
1791     [64] https://lwn.net/Articles/758963/
1792
1793     [65] http://algogroup.unimore.it/people/paolo/pub-docs/extende-
1794     d-lat-bw-throughput.pdf
1795
1796     [66] https://lwn.net/Articles/763603/#Comments
1797
1798     [67] https://lwn.net/Articles/763175/
1799
1800     [68] https://lwn.net/Archives/ConferenceByYear/#2018-Akademy
1801
1802     [69] https://dot.kde.org/2017/11/30/kdes-goals-2018-and-beyond
1803
1804     [70] https://phabricator.kde.org/T7116
1805
1806     [71] https://phabricator.kde.org/T6831
1807
1808     [72] https://phabricator.kde.org/T7050
1809
1810     [73] https://akademy.kde.org/
1811
1812     [74] https://community.kde.org/Promo
1813
1814     [75] https://www.chakralinux.org/
1815
1816     [76] https://conf.kde.org/en/Akademy2018/public/events/79
1817
1818     [77] https://en.wikipedia.org/wiki/Onboarding
1819
1820     [78] https://community.kde.org/Get_Involved
1821
1822     [79] https://community.kde.org/KDE/Junior_Jobs
1823
1824     [80] https://lwn.net/Articles/763189/
1825
1826     [81] https://phabricator.kde.org/T8686
1827
1828     [82] https://phabricator.kde.org/T7646
1829
1830     [83] https://bugs.kde.org/
1831
1832     [84] https://www.plasma-mobile.org/index.html
1833
1834     [85] https://www.plasma-mobile.org/findyourway
1835
1836     [86] https://lwn.net/Articles/763175/#Comments
1837
1838     [87] https://lwn.net/Articles/763492/
1839
1840     [88] https://datproject.org
1841
1842     [89] https://www.bittorrent.com/
1843
1844     [90] https://github.com/datproject/dat/releases
1845
1846     [91] https://docs.datproject.org/install
1847
1848     [92] https://datbase.org/
1849
1850     [93] https://ed25519.cr.yp.to/
1851
1852     [94] https://en.wikipedia.org/wiki/Mainline_DHT
1853
1854     [95] https://github.com/mafintosh/dns-discovery
1855
1856     [96] https://en.wikipedia.org/wiki/Magnet_URI_scheme
1857
1858     [97] https://blog.datproject.org/2017/10/13/using-dat-for-auto-
1859     matic-file-backups/
1860
1861     [98] https://github.com/mafintosh/hypercore-archiver
1862
1863     [99] https://ipfs.io/
1864
1865     [100] https://github.com/ipfs/go-ipfs/issues/875
1866
1867     [101] https://github.com/ipfs/go-ipfs/blob/master/docs/experim-
1868     ental-features.md#ipfs-filestore
1869
1870     [102] https://hashbase.io/
1871
1872     [103] https://github.com/datprotocol/DEPs/blob/master/proposal-
1873     s/0003-http-pinning-service-api.md
1874
1875     [104] https://docs.datproject.org/server
1876
1877     [105] https://lwn.net/Articles/763544/
1878
1879     [106] https://beakerbrowser.com/
1880
1881     [107] https://electronjs.org/
1882
1883     [108] https://github.com/beakerbrowser/explore
1884
1885     [109] https://addons.mozilla.org/en-US/firefox/addon/dat-p2p-p-
1886     rotocol/
1887
1888     [110] https://github.com/sammacbeth/dat-fox
1889
1890     [111] https://github.com/sammacbeth/dat-fox-helper
1891
1892     [112] https://github.com/beakerbrowser/dat-photos-app
1893
1894     [113] https://github.com/datproject/docs/raw/master/papers/dat-
1895     paper.pdf
1896
1897     [114] https://github.com/datprotocol/DEPs/blob/653e0cf40233b5d-
1898     474cddc04235577d9d55b2934/proposals/0000-peer-discovery.md#dis-
1899     covery-keys
1900
1901     [115] https://docs.datproject.org/security
1902
1903     [116] https://blog.datproject.org/2016/12/12/reader-privacy-on-
1904     the-p2p-web/
1905
1906     [117] https://blog.datproject.org/2017/12/10/dont-ship/
1907
1908     [118] https://github.com/datprotocol/DEPs/pull/7
1909
1910     [119] https://blog.datproject.org/2017/06/01/dat-sleep-release/
1911
1912     [120] https://github.com/datprotocol/DEPs
1913
1914     [121] https://github.com/datprotocol/DEPs/blob/master/proposal-
1915     s/0008-multiwriter.md
1916
1917     [122] https://github.com/mafintosh/hyperdb
1918
1919     [123] https://codeforscience.org/
1920
1921     [124] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890565
1922
1923     [125] https://github.com/datrs
1924
1925     [126] https://nodejs.org/en/
1926
1927     [127] https://bunsenbrowser.github.io/#!index.md
1928
1929     [128] https://termux.com/
1930
1931     [129] https://bluelinklabs.com/
1932
1933     [130] https://www.digital-democracy.org/
1934
1935     [131] https://archive.org
1936
1937     [132] https://blog.archive.org/2018/06/05/internet-archive-cod-
1938     e-for-science-and-society-and-california-digital-library-to-pa-
1939     rtner-on-a-data-sharing-and-preservation-pilot-project/
1940
1941     [133] https://github.com/codeforscience/Dat-in-the-Lab
1942
1943     [134] https://www.mozilla.org/en-US/moss/
1944
1945     [135] https://github.com/datprotocol/DEPs/blob/master/proposal-
1946     s/0005-dns.md
1947
1948     [136] https://lwn.net/Articles/763492/#Comments
1949
1950     [137] https://lwn.net/Articles/763254/
1951
1952     [138] https://lwn.net/Articles/763255/
1953
1954     [139] https://lwn.net/Articles/763254/
1955
1956
1957