Fix new tests and make TestLWN work

[gofetch.git] / test / expected / LWN / 0000763252
diff --git a/test/expected/LWN/0000763252 b/test/expected/LWN/0000763252

new file mode 100644 (file)

index 0000000..f5a2048
--- /dev/null
+++ b/test/expected/LWN/0000763252
@@ -0,0 +1,1957 @@
+             LWN.NET WEEKLY EDITION FOR AUGUST 30, 2018            \r
+\r
+  \r
+\r
+  o News link: https://lwn.net/Articles/763252/\r
+  o Source link: \r
+\r
+\r
+    [1]Welcome  to  the  LWN.net Weekly Edition for August 30, 2018\r
+    This edition contains the following feature content:\r
+    \r
+    [2]An  introduction  to the Julia language, part 1 : Julia is a\r
+    language  designed  for  intensive numerical calculations; this\r
+    article gives an overview of its core features.\r
+    \r
+    [3]C  considered  dangerous  :  a Linux Security Summit talk on\r
+    what is being done to make the use of C in the kernel safer.\r
+    \r
+    [4]The  second  half  of  the  4.19  merge  window  : the final\r
+    features  merged (or not merged) before the merge window closed\r
+    for this cycle.\r
+    \r
+    [5]Measuring  (and fixing) I/O-controller throughput loss : the\r
+    kernel's   I/O   controllers   can   provide  useful  bandwidth\r
+    guarantees, but at a significant cost in throughput.\r
+    \r
+    [6]KDE's  onboarding initiative, one year later : what has gone\r
+    right  in  KDE's  effort  to make it easier for contributors to\r
+    join the project, and what remains to be done.\r
+    \r
+    [7]Sharing  and  archiving  data  sets with Dat : an innovative\r
+    approach to addressing and sharing data on the net.\r
+    \r
+    This week's edition also includes these inner pages:\r
+    \r
+    [8]Brief   items   :  Brief  news  items  from  throughout  the\r
+    community.\r
+    \r
+    [9]Announcements  : Newsletters, conferences, security updates,\r
+    patches, and more.\r
+    \r
+    Please  enjoy  this  week's  edition, and, as always, thank you\r
+    for supporting LWN.net.\r
+    \r
+    [10]Comments (none posted)\r
+    \r
+    [11]An introduction to the Julia language, part 1\r
+    \r
+    August 28, 2018\r
+    \r
+    This article was contributed by Lee Phillips\r
+    \r
+    [12]Julia  is  a  young  computer language aimed at serving the\r
+    needs  of  scientists,  engineers,  and  other practitioners of\r
+    numerically   intensive  programming.  It  was  first  publicly\r
+    released   in   2012.  After  an  intense  period  of  language\r
+    development,  version 1.0 was [13]released on August 8. The 1.0\r
+    release  promises  years  of  language  stability; users can be\r
+    confident  that  developments  in the 1.x series will not break\r
+    their  code.  This  is  the  first  part  of a two-part article\r
+    introducing  the  world  of  Julia.  This  part  will introduce\r
+    enough  of  the  language syntax and constructs to allow you to\r
+    begin  to write simple programs. The following installment will\r
+    acquaint  you  with the additional pieces needed to create real\r
+    projects, and to make use of Julia's ecosystem.\r
+    \r
+    Goals and history\r
+    \r
+    The  Julia  project  has ambitious goals. It wants the language\r
+    to  perform  about  as  well  as  Fortran  or  C  when  running\r
+    numerical  algorithms,  while  remaining as pleasant to program\r
+    in  as Python. I believe the project has met these goals and is\r
+    poised  to  see  increasing  adoption by numerical researchers,\r
+    especially now that an official, stable release is available.\r
+    \r
+    The  Julia  project  maintains  a [14]micro-benchmark page that\r
+    compares  its  numerical  performance  against  both statically\r
+    compiled   languages   (C,   Fortran)   and  dynamically  typed\r
+    languages  (R,  Python). While it's certainly possible to argue\r
+    about  the relevance and fairness of particular benchmarks, the\r
+    data  overall  supports  the Julia team's contention that Julia\r
+    has   generally   achieved  parity  with  Fortran  and  C;  the\r
+    benchmark source code is available.\r
+    \r
+    Julia  began  as  research  in  computer  science  at  MIT; its\r
+    creators  are  Alan  Edelman,  Stefan Karpinski, Jeff Bezanson,\r
+    and  Viral  Shah.  These  four  remain active developers of the\r
+    language.  They, along with Keno Fischer, co-founder and CTO of\r
+    [15]Julia  Computing , were kind enough to share their thoughts\r
+    with  us  about the language. I'll be drawing on their comments\r
+    later  on;  for now, let's get a taste of what Julia code looks\r
+    like.\r
+    \r
+    Getting started\r
+    \r
+    To   explore   Julia   initially,   start   up   its   standard\r
+    [16]read-eval-print   loop   (REPL)  by  typing  julia  at  the\r
+    terminal,  assuming  that  you have installed it. You will then\r
+    be  able  to  interact with what will seem to be an interpreted\r
+    language  —  but,  behind  the scenes, those commands are being\r
+    compiled  by  a  just-in-time  (JIT)  compiler  that  uses  the\r
+    [17]LLVM   compiler   framework  .  This  allows  Julia  to  be\r
+    interactive,  while  turning the code into fast, native machine\r
+    instructions.   However,  the  JIT  compiler  passes  sometimes\r
+    introduce  noticeable delays at the REPL, especially when using\r
+    a function for the first time.\r
+    \r
+    To  run  a  Julia  program non-interactively, execute a command\r
+    like: $ julia script.jl\r
+    \r
+    Julia  has  all  the  usual data structures: numbers of various\r
+    types     (including    complex    and    rational    numbers),\r
+    multidimensional    arrays,    dictionaries,    strings,    and\r
+    characters.  Functions  are  first-class: they can be passed as\r
+    arguments  to other functions, can be members of arrays, and so\r
+    on.\r
+    \r
+    Julia  embraces  Unicode. Strings, which are enclosed in double\r
+    quotes,  are  arrays  of Unicode characters, which are enclosed\r
+    in  single  quotes.  The  " * " operator is used for string and\r
+    character  concatenation.  Thus 'a' and 'β' are characters, and\r
+    'aβ'  is  a syntax error. "a" and "β" are strings, as are "aβ",\r
+    'a' * 'β', and "a" * "β" — all evaluate to the same string.\r
+    \r
+    Variable  and  function names can contain non-ASCII characters.\r
+    This,   along  with  Julia's  clever  syntax  that  understands\r
+    numbers  prepended  to variables to mean multiplication, goes a\r
+    long  way  to  allowing  the  numerical scientist to write code\r
+    that  more  closely resembles the compact mathematical notation\r
+    of the equations that usually lie behind it.  julia ε₁ = 0.01\r
+    \r
+    0.01\r
+    \r
+    julia ε₂ = 0.02\r
+    \r
+    0.02\r
+    \r
+    julia 2ε₁ + 3ε₂\r
+    \r
+    0.08\r
+    \r
+    And  where  does  Julia come down on the age-old debate of what\r
+    do  about  1/2  ? In Fortran and Python 2, this will get you 0,\r
+    since  1  and 2 are integers, and the result is rounded down to\r
+    the  integer  0. This was deemed inconsistent, and confusing to\r
+    some,  so  it  was changed in Python 3 to return 0.5 — which is\r
+    what you get in Julia, too.\r
+    \r
+    While  we're  on  the  subject  of  fractions, Julia can handle\r
+    rational  numbers,  with  a special syntax: 3//5 + 2//3 returns\r
+    19//15  ,  while  3/5  + 2/3 gets you the floating-point answer\r
+    1.2666666666666666.  Internally,  Julia  thinks  of  a rational\r
+    number  in  its  reduced  form,  so the expression 6//8 == 3//4\r
+    returns true , and numerator(6//8) returns 3 .\r
+    \r
+    Arrays\r
+    \r
+    Arrays  are  enclosed  in  square  brackets and indexed with an\r
+    iterator  that  can  contain a step value:  julia a = [1, 2, 3,\r
+    4, 5, 6]\r
+    \r
+    6-element Array{Int64,1}:\r
+    \r
+    1\r
+    \r
+    2\r
+    \r
+    3\r
+    \r
+    4\r
+    \r
+    5\r
+    \r
+    6\r
+    \r
+    julia a[1:2:end]\r
+    \r
+    3-element Array{Int64,1}:\r
+    \r
+    1\r
+    \r
+    3\r
+    \r
+    5\r
+    \r
+    As  you  can  see,  indexing  starts at one, and the useful end\r
+    index  means  the  obvious thing. When you define a variable in\r
+    the  REPL,  Julia  replies  with  the  type  and  value  of the\r
+    assigned  data;  you  can  suppress  this output by ending your\r
+    input line with a semicolon.\r
+    \r
+    Since  arrays  are  such a vital part of numerical computation,\r
+    and  Julia makes them easy to work with, we'll spend a bit more\r
+    time with them than the other data structures.\r
+    \r
+    To  illustrate  the  syntax,  we  can start with a couple of 2D\r
+    arrays, defined at the REPL:  julia a = [1 2 3; 4 5 6]\r
+    \r
+    2×3 Array{Int64,2}:\r
+    \r
+    1 2 3\r
+    \r
+    4 5 6\r
+    \r
+    julia z = [-1 -2 -3; -4 -5 -6];\r
+    \r
+    Indexing is as expected:  julia a[1, 2]\r
+    \r
+    2\r
+    \r
+    You can glue arrays together horizontally:  julia [a z]\r
+    \r
+    2×6 Array{Int64,2}:\r
+    \r
+    1 2 3 -1 -2 -3\r
+    \r
+    4 5 6 -4 -5 -6\r
+    \r
+    And vertically:  julia [a; z]\r
+    \r
+    4×3 Array{Int64,2}:\r
+    \r
+    1  2  3\r
+    \r
+    4  5  6\r
+    \r
+    -1 -2 -3\r
+    \r
+    -4 -5 -6\r
+    \r
+    Julia  has  all  the  usual  operators for handling arrays, and\r
+    [18]linear  algebra  functions  that  work  with  matrices  (2D\r
+    arrays).  The  linear  algebra  functions  are  part of Julia's\r
+    standard  library,  but need to be imported with a command like\r
+    "  using  LinearAlgebra  ",  which is a detail omitted from the\r
+    current  documentation.  The  functions  include such things as\r
+    determinants,  matrix  inverses,  eigenvalues and eigenvectors,\r
+    many  kinds  of  matrix  factorizations,  etc.  Julia  has  not\r
+    reinvented  the  wheel  here,  but  wisely  uses the [19]LAPACK\r
+    Fortran library of battle-tested linear algebra routines.\r
+    \r
+    The  extension  of  arithmetic  operators  to arrays is usually\r
+    intuitive:  julia a + z\r
+    \r
+    2×3 Array{Int64,2}:\r
+    \r
+    0 0 0\r
+    \r
+    0 0 0\r
+    \r
+    And  the  numerical  prepending  syntax works with arrays, too:\r
+    julia 3a + 4z\r
+    \r
+    2×3 Array{Int64,2}:\r
+    \r
+    -1 -2 -3\r
+    \r
+    -4 -5 -6\r
+    \r
+    Putting  a  multiplication  operator  between two matrices gets\r
+    you matrix multiplication:  julia a * transpose(a)\r
+    \r
+    2×2 Array{Int64,2}:\r
+    \r
+    14 32\r
+    \r
+    32 77\r
+    \r
+    You  can  "broadcast"  numbers  to cover all the elements in an\r
+    array  by prepending the usual arithmetic operators with a dot:\r
+    julia 1 .+ a\r
+    \r
+    2×3 Array{Int64,2}:\r
+    \r
+    2 3 4\r
+    \r
+    5 6 7\r
+    \r
+    Note  that the language only actually requires the dot for some\r
+    operators,  but  not  for  others,  such  as  "*"  and "/". The\r
+    reasons  for this are arcane, and it probably makes sense to be\r
+    consistent  and  use  the dot whenever you intend broadcasting.\r
+    Note   also   that   the   current   version  of  the  official\r
+    documentation  is  incorrect  in claiming that you may omit the\r
+    dot from "+" and "-"; in fact, this now gives an error.\r
+    \r
+    You  can  use  the  dot  notation to turn any function into one\r
+    that   operates   on   each   element   of  an  array:    julia\r
+    round.(sin.([0, π/2, π, 3π/2, 2π]))\r
+    \r
+    5-element Array{Float64,1}:\r
+    \r
+    0.0\r
+    \r
+    1.0\r
+    \r
+    0.0\r
+    \r
+    -1.0\r
+    \r
+    -0.0\r
+    \r
+    The  example  above  illustrates  chaining two dotted functions\r
+    together.  The  Julia compiler turns expressions like this into\r
+    "fused"  operations:  instead of applying each function in turn\r
+    to  create a new array that is passed to the next function, the\r
+    compiler   combines   the  functions  into  a  single  compound\r
+    function  that  is  applied  once  over  the  array, creating a\r
+    significant optimization.\r
+    \r
+    You  can  use  this  dot  notation with any function, including\r
+    your  own, to turn it into a version that operates element-wise\r
+    over arrays.\r
+    \r
+    Dictionaries  (associative  arrays) can be defined with several\r
+    syntaxes. Here's one:  julia d1 = Dict("A"=1, "B"=2)\r
+    \r
+    Dict{String,Int64} with 2 entries:\r
+    \r
+    "B" = 2\r
+    \r
+    "A" = 1\r
+    \r
+    You  may  have  noticed  that the code snippets so far have not\r
+    included  any  type  declarations.  Every  value in Julia has a\r
+    type,  but  the  compiler  will  infer  types  if  they are not\r
+    specified.  It  is generally not necessary to declare types for\r
+    performance,   but  type  declarations  sometimes  serve  other\r
+    purposes,  that  we'll  return  to  later. Julia has a deep and\r
+    sophisticated  type  system,  including  user-defined types and\r
+    C-like  structs. Types can have behaviors associated with them,\r
+    and  can  inherit  behaviors  from  other types. The best thing\r
+    about  Julia's  type system is that you can ignore it entirely,\r
+    use  just  a  few  pieces  of  it,  or spend weeks studying its\r
+    design.\r
+    \r
+    Control flow\r
+    \r
+    Julia  code  is organized in blocks, which can indicate control\r
+    flow,  function  definitions,  and other code units. Blocks are\r
+    terminated  with  the  end  keyword,  and  indentation  is  not\r
+    significant.  Statements  are separated either with newlines or\r
+    semicolons.\r
+    \r
+    Julia  has the typical control flow constructs; here is a while\r
+    block:  julia i = 1;\r
+    \r
+    julia while i 5\r
+    \r
+    print(i)\r
+    \r
+    global i = i + 1\r
+    \r
+    end\r
+    \r
+    1234\r
+    \r
+    Notice  the  global  keyword.  Most blocks in Julia introduce a\r
+    local  scope for variables; without this keyword here, we would\r
+    get an error about an undefined variable.\r
+    \r
+    Julia  has  the  usual if statements and for loops that use the\r
+    same  iterators that we introduced above for array indexing. We\r
+    can  also  iterate  over collections:  julia for i ∈ ['a', 'b',\r
+    'c']\r
+    \r
+    println(i)\r
+    \r
+    end\r
+    \r
+    a\r
+    \r
+    b\r
+    \r
+    c\r
+    \r
+    In  place of the fancy math symbol in this for loop, we can use\r
+    "  =  "  or " in ". If you want to use the math symbol but have\r
+    no  convenient  way  to type it, the REPL will help you: type "\r
+    \in  "  and  the  TAB key, and the symbol appears; you can type\r
+    many [20]LaTeX expressions into the REPL in this way.\r
+    \r
+    Development of Julia\r
+    \r
+    The   language   is   developed   on   GitHub,  with  over  700\r
+    contributors.  The  Julia  team  mentioned in their email to us\r
+    that  the decision to use GitHub has been particularly good for\r
+    Julia,  as  it  streamlined  the  process  for  many  of  their\r
+    contributors,  who  are scientists or domain experts in various\r
+    fields, rather than professional software developers.\r
+    \r
+    The  creators  of  Julia  have  [21]published  [PDF] a detailed\r
+    “mission  statement”  for  the  language, describing their aims\r
+    and  motivations.  A  key issue that they wanted their language\r
+    to  solve  is what they called the "two-language problem." This\r
+    situation  is familiar to anyone who has used Python or another\r
+    dynamic  language on a demanding numerical problem. To get good\r
+    performance,   you  will  wind  up  rewriting  the  numerically\r
+    intensive  parts  of  the program in C or Fortran, dealing with\r
+    the  interface  between  the  two  languages,  and may still be\r
+    disappointed  in  the overhead presented by calling the foreign\r
+    routines from your original code.\r
+    \r
+    For  Python,  [22]NumPy and SciPy wrap many numerical routines,\r
+    written  in Fortran or C, for efficient use from that language,\r
+    but  you  can  only  take advantage of this if your calculation\r
+    fits  the  pattern  of  an  available  routine; in more general\r
+    cases,  where you will have to write a loop over your data, you\r
+    are  stuck with Python's native performance, which is orders of\r
+    magnitude  slower.  If  you  switch  to  an alternative, faster\r
+    implementation  of  Python,  such  as  [23]PyPy , the numerical\r
+    libraries  may  not  be  compatible; NumPy became available for\r
+    PyPy only within about the past year.\r
+    \r
+    Julia  solves  the  two-language problem by being as expressive\r
+    and  simple  to  program  in  as  a dynamic scripting language,\r
+    while  having  the  native  performance  of  a static, compiled\r
+    language.  There  is  no need to write numerical libraries in a\r
+    second  language,  but  C  or  Fortran  library routines can be\r
+    called   using  a  facility  that  Julia  has  built-in.  Other\r
+    languages,  such as [24]Python or [25]R , can also interoperate\r
+    easily with Julia using external packages.\r
+    \r
+    Documentation\r
+    \r
+    There  are  many  resources  to  turn to to learn the language.\r
+    There   is  an  extensive  and  detailed  [26]manual  at  Julia\r
+    headquarters,  and  this may be a good place to start. However,\r
+    although  the first few chapters provide a gentle introduction,\r
+    the  material soon becomes dense and, at times, hard to follow,\r
+    with  references to concepts that are not explained until later\r
+    chapters.  Fortunately,  there  is a [27]"learning" link at the\r
+    top  of  the Julia home page, which takes you to a long list of\r
+    videos,  tutorials,  books,  articles,  and  classes both about\r
+    Julia  and that use Julia in teaching subjects such a numerical\r
+    analysis.  There  is also a fairly good [28]cheat-sheet [PDF] ,\r
+    which was just updated for v. 1.0.\r
+    \r
+    If  you're  coming  from  Python,  [29]this  list of noteworthy\r
+    differences  between  Python  and Julia syntax will probably be\r
+    useful.\r
+    \r
+    Some  of  the  linked  tutorials are in the form of [30]Jupyter\r
+    notebooks  — indeed, the name "Jupyter" is formed from "Julia",\r
+    "Python",  and  "R",  which  are  the  three original languages\r
+    supported  by  the  interface. The [31]Julia kernel for Jupyter\r
+    was  recently upgraded to support v. 1.0. Judicious sampling of\r
+    a  variety  of  documentation  sources,  combined  with liberal\r
+    experimentation,  may be the best way of learning the language.\r
+    Jupyter  makes this experimentation more inviting for those who\r
+    enjoy  the  web-based  interface,  but the REPL that comes with\r
+    Julia  helps  a  great  deal  in  this regard by providing, for\r
+    instance,  TAB  completion and an extensive help system invoked\r
+    by simply pressing the "?" key.\r
+    \r
+    Stay tuned\r
+    \r
+    The  [32]next  installment in this two-part series will explain\r
+    how   Julia  is  organized  around  the  concept  of  "multiple\r
+    dispatch".  You  will  learn  how  to create functions and make\r
+    elementary  use  of  Julia's  type  system.  We'll  see  how to\r
+    install  packages  and  use  modules,  and  how to make graphs.\r
+    Finally,  Part  2  will  briefly survey the important topics of\r
+    macros and distributed computing.\r
+    \r
+    [33]Comments (80 posted)\r
+    \r
+    [34]C considered dangerous\r
+    \r
+    By Jake Edge\r
+    \r
+    August 29, 2018\r
+    \r
+    [35]LSS NA\r
+    \r
+    At  the  North  America  edition of the [36]2018 Linux Security\r
+    Summit  (LSS  NA),  which was held in late August in Vancouver,\r
+    Canada,  Kees  Cook  gave a presentation on some of the dangers\r
+    that  come  with  programs  written  in  C.  In  particular, of\r
+    course,  the  Linux  kernel is mostly written in C, which means\r
+    that  the security of our systems rests on a somewhat dangerous\r
+    foundation.  But there are things that can be done to help firm\r
+    things  up  by  " Making C Less Dangerous " as the title of his\r
+    talk suggested.\r
+    \r
+    He  began  with  a brief summary of the work that he and others\r
+    are  doing  as  part  of the [37]Kernel Self Protection Project\r
+    (KSPP).  The  goal  of the project is to get kernel protections\r
+    merged  into  the  mainline. These protections are not targeted\r
+    at  protecting user-space processes from other (possibly rogue)\r
+    processes,  but  are, instead, focused on protecting the kernel\r
+    from  user-space  code.  There  are around 12 organizations and\r
+    ten  individuals  working  on roughly 20 different technologies\r
+    as  part  of the KSPP, he said. The progress has been "slow and\r
+    steady", he said, which is how he thinks it should go.  [38]\r
+    \r
+    One  of  the  main  problems is that C is treated mostly like a\r
+    fancy  assembler.  The  kernel  developers do this because they\r
+    want  the  kernel to be as fast and as small as possible. There\r
+    are   other   reasons,   too,   such   as   the   need   to  do\r
+    architecture-specific  tasks that lack a C API (e.g. setting up\r
+    page tables, switching to 64-bit mode).\r
+    \r
+    But   there   is   lots   of  undefined  behavior  in  C.  This\r
+    "operational   baggage"   can  lead  to  various  problems.  In\r
+    addition,  C  has a weak standard library with multiple utility\r
+    functions  that  have  various  pitfalls.  In C, the content of\r
+    uninitialized  automatic  variables  is  undefined,  but in the\r
+    machine  code that it gets translated to, the value is whatever\r
+    happened  to  be  in  that  memory  location  before.  In  C, a\r
+    function  pointer can be called even if the type of the pointer\r
+    does  not  match the type of the function being called—assembly\r
+    doesn't care, it just jumps to a location, he said.\r
+    \r
+    The  APIs  in  the standard library are also bad in many cases.\r
+    He  asked:  why is there no argument to memcpy() to specify the\r
+    maximum  destination  length?  He  noted a recent [39]blog post\r
+    from  Raph  Levien  entitled "With Undefined Behavior, Anything\r
+    is  Possible".  That  obviously  resonated  with  Cook,  as  he\r
+    pointed  out  his  T-shirt—with  the title and artwork from the\r
+    post.\r
+    \r
+    Less danger\r
+    \r
+    He  then  moved on to some things that kernel developers can do\r
+    (and  are  doing) to get away from some of the dangers of C. He\r
+    began  with variable-length arrays (VLAs), which can be used to\r
+    overflow  the  stack to access data outside of its region. Even\r
+    if  the  stack  has a guard page, VLAs can be used to jump past\r
+    it  to  write into other memory, which can then be used by some\r
+    other  kind  of  attack. The C language is "perfectly fine with\r
+    this".  It  is  easy  to find uses of VLAs with the -Wvla flag,\r
+    however.\r
+    \r
+    But  it  turns  out  that  VLAs  are  [40]not  just  bad from a\r
+    security   perspective   ,   they   are   also   slow.   In   a\r
+    micro-benchmark  associated with a [41]patch removing a VLA , a\r
+    13%  performance  boost  came from using a fixed-size array. He\r
+    dug  in  a  bit  further and found that much more code is being\r
+    generated  to  handle a VLA, which explains the speed increase.\r
+    Since  Linus  Torvalds  has  [42]declared  that  VLAs should be\r
+    removed  from  the  kernel because they cause security problems\r
+    and also slow the kernel down; Cook said "don't use VLAs".\r
+    \r
+    Another  problem area is switch statements, in particular where\r
+    there  is  no  break  for  a  case  .  That could mean that the\r
+    programmer  expects  and wants to fall through to the next case\r
+    or  it could be that the break was simply forgotten. There is a\r
+    way  to  get a warning from the compiler for fall-throughs, but\r
+    there  needs  to be a way to mark those that are truly meant to\r
+    be  that way. A special fall-through "statement" in the form of\r
+    a   comment   is   what   has   been   agreed   on  within  the\r
+    static-analysis  community.  He  and  others  have  been  going\r
+    through  each  of  the  places  where  there is no break to add\r
+    these  comments  (or  a break ); they have "found a lot of bugs\r
+    this way", he said.\r
+    \r
+    Uninitialized  local variables will generate a warning, but not\r
+    if  the  variable is passed in by reference. There are some GCC\r
+    plugins  that  will  automatically  initialize these variables,\r
+    but  there are also patches for both GCC and Clang to provide a\r
+    compiler  option  to  do  so. Neither of those is upstream yet,\r
+    but  Torvalds has praised the effort so the kernel would likely\r
+    use  the  option.  An  interesting  side effect that came about\r
+    while   investigating   this   was   a  warning  he  got  about\r
+    unreachable  code  when  he  enabled  the  auto-initialization.\r
+    There  were  two  variables  declared  just after a switch (and\r
+    outside of any case ), where they would never be reached.\r
+    \r
+    Arithmetic  overflow  is  another  undefined behavior in C that\r
+    can  cause various problems. GCC can check for signed overflow,\r
+    which  performs  well  (the overhead is in the noise, he said),\r
+    but  adding warning messages for it does grow the kernel by 6%;\r
+    making  the  overflow abort, instead, only adds 0.1%. Clang can\r
+    check  for  both  signed and unsigned overflow; signed overflow\r
+    is  undefined,  while  unsigned  overflow is defined, but often\r
+    unexpected.  Marking places where unsigned overflow is expected\r
+    is  needed;  it would be nice to get those annotations put into\r
+    the kernel, Cook said.\r
+    \r
+    Explicit   bounds   checking   is   expensive.   Doing  it  for\r
+    copy_{to,from}_user()  is  a  less than 1% performance hit, but\r
+    adding  it  to  the strcpy() and memcpy() families are around a\r
+    2%  hit. Pre-Meltdown that would have been a totally impossible\r
+    performance  regression  for  security, he said; post-Meltdown,\r
+    since  it  is less than 5%, maybe there is a chance to add this\r
+    checking.\r
+    \r
+    Better  APIs would help as well. He pointed to the evolution of\r
+    strcpy()  ,  through  str  n  cpy()  and str l cpy() (each with\r
+    their  own bounds flaws) to str s cpy() , which seems to be "OK\r
+    so  far".  He  also mentioned memcpy() again as a poor API with\r
+    respect to bounds checking.\r
+    \r
+    Hardware  support  for  bounds  checking  is  available  in the\r
+    application  data  integrity  (ADI)  feature  for  SPARC and is\r
+    coming  for  Arm; it may also be available for Intel processors\r
+    at  some point. These all use a form of "memory tagging", where\r
+    allocations  get a tag that is stored in the high-order byte of\r
+    the  address.  An offset from the address can be checked by the\r
+    hardware  to  see if it still falls within the allocated region\r
+    based on the tag.\r
+    \r
+    Control-flow  integrity  (CFI)  has  become  more  of  an issue\r
+    lately  because much of what attackers had used in the past has\r
+    been  marked  as  "no  execute"  so  they  are turning to using\r
+    existing  code  "gadgets"  already  present  in  the  kernel by\r
+    hijacking  existing indirect function calls. In C, you can just\r
+    call  pointers  without  regard  to  the type as it just treats\r
+    them  as  an  address  to  jump  to.  Clang  has a CFI-sanitize\r
+    feature  that  enforces  the function prototype to restrict the\r
+    calls  that  can  be  made.  It  is  done at runtime and is not\r
+    perfect,  in  part  because  there are lots of functions in the\r
+    kernel  that  take  one  unsigned  long parameter and return an\r
+    unsigned long.\r
+    \r
+    Attacks  on  CFI  have both a "forward edge", which is what CFI\r
+    sanitize  tries  to  handle,  and  a "backward edge" that comes\r
+    from  manipulating  the  stack  values,  the  return address in\r
+    particular.  Clang  has  two  methods  available to prevent the\r
+    stack  manipulation.  The first is the "safe stack", which puts\r
+    various   important  items  (e.g.  "safe"  variables,  register\r
+    spills,   and   the   return  address)  on  a  separate  stack.\r
+    Alternatively,  the  "shadow  stack" feature creates a separate\r
+    stack just for return addresses.\r
+    \r
+    One  problem  with  these  other  stacks is that they are still\r
+    writable,  so  if an attacker can find them in memory, they can\r
+    still  perform  their attacks. Hardware-based protections, like\r
+    Intel's     Control-Flow    Enforcement    Technology    (CET),\r
+    [43]provides   a   read-only   shadow  call  stack  for  return\r
+    addresses.   Another   hardware   protection   is   [44]pointer\r
+    authentication  for  Arm, which adds a kind of encrypted tag to\r
+    the return address that can be verified before it is used.\r
+    \r
+    Status and challenges\r
+    \r
+    Cook  then  went  through  the current status of handling these\r
+    different  problems  in  the kernel. VLAs are almost completely\r
+    gone,  he  said,  just a few remain in the crypto subsystem; he\r
+    hopes  those  VLAs will be gone by 4.20 (or whatever the number\r
+    of  the  next  kernel  release  turns  out  to  be).  Once that\r
+    happens,  he  plans  to  turn  on -Wvla for the kernel build so\r
+    that none creep back in.\r
+    \r
+    There  has  been  steady  progress made on marking fall-through\r
+    cases  in  switch  statements. Only 745 remain to be handled of\r
+    the  2311  that  existed  when  this  work  started;  each  one\r
+    requires  scrutiny  to  determine  what the author's intent is.\r
+    Auto-initialized  local  variables  can  be done using compiler\r
+    plugins,  but  that  is "not quite what we want", he said. More\r
+    compiler   support  would  be  helpful  there.  For  arithmetic\r
+    overflow,  it  would  be  nice  to  see GCC get support for the\r
+    unsigned  case,  but  memory allocations are now doing explicit\r
+    overflow checking at this point.\r
+    \r
+    Bounds  checking has seen some "crying about performance hits",\r
+    so  we  are  waiting impatiently for hardware support, he said.\r
+    CFI  forward-edge  protection  needs [45]link-time optimization\r
+    (LTO)  support  for  Clang  in  the kernel, but it is currently\r
+    working  on  Android.  For  backward-edge mitigation, the Clang\r
+    shadow   call   stack   is  working  on  Android,  but  we  are\r
+    impatiently waiting for hardware support for that too.\r
+    \r
+    There  are a number of challenges in doing security development\r
+    for  the  kernel,  Cook said. There are cultural boundaries due\r
+    to  conservatism  within  the  kernel  community; that requires\r
+    patiently  working  and reworking features in order to get them\r
+    upstream.  There  are,  of course, technical challenges because\r
+    of  the complexity of security changes; those kinds of problems\r
+    can  be solved. There are also resource limitations in terms of\r
+    developers,  testers,  reviewers, and so on. KSPP and the other\r
+    kernel  security  developers  are  still  making that "slow but\r
+    steady" progress.\r
+    \r
+    Cook's  [46]slides  [PDF] are available for interested readers;\r
+    before  long,  there should be a video available of the talk as\r
+    well.\r
+    \r
+    [I  would  like  to  thank  LWN's  travel  sponsor,  the  Linux\r
+    Foundation,  for travel assistance to attend the Linux Security\r
+    Summit in Vancouver.]\r
+    \r
+    [47]Comments (70 posted)\r
+    \r
+    [48]The second half of the 4.19 merge window\r
+    \r
+    By Jonathan Corbet\r
+    \r
+    August  26,  2018    By  the  time  Linus Torvalds [49]released\r
+    4.19-rc1  and  closed  the  merge  window  for this development\r
+    cycle,  12,317  non-merge  changesets  had found their way into\r
+    the  mainline;  about  4,800  of  those  landed  after [50]last\r
+    week's  summary  was  written.  As tends to be the case late in\r
+    the  merge  window,  many  of  those changes were fixes for the\r
+    bigger  patches  that  went  in  early,  but  there were also a\r
+    number  of  new  features  added.  Some of the more significant\r
+    changes include:\r
+    \r
+    Core kernel\r
+    \r
+    The  full  set of patches adding [51]control-group awareness to\r
+    the  out-of-memory  killer  has  not been merged due to ongoing\r
+    disagreements,  but  one  piece  of  it  has:  there  is  a new\r
+    memory.oom.group  control  knob  that  will cause all processes\r
+    within  a  control  group  to  be  killed  in  an out-of-memory\r
+    situation.\r
+    \r
+    A  new set of protections has been added to prevent an attacker\r
+    from  fooling  a  program  into  writing to an existing file or\r
+    FIFO.  An  open  with  the  O_CREAT flag to a file or FIFO in a\r
+    world-writable,  sticky directory (e.g. /tmp ) will fail if the\r
+    owner  of  the  opening  process is not the owner of either the\r
+    target   file  or  the  containing  directory.  This  behavior,\r
+    disabled    by    default,    is    controlled   by   the   new\r
+    protected_regular and protected_fifos sysctl knobs.\r
+    \r
+    Filesystems and block layer\r
+    \r
+    The  dm-integrity  device-mapper  target can now use a separate\r
+    device for metadata storage.\r
+    \r
+    EROFS,  the  "enhanced read-only filesystem", has been added to\r
+    the  staging  tree. It is " a lightweight read-only file system\r
+    with    modern   designs   (eg.   page-sized   blocks,   inline\r
+    xattrs/data,  etc.)  for  scenarios which need high-performance\r
+    read-only  requirements,  eg.  firmwares  in  mobile  phone  or\r
+    LIVECDs "\r
+    \r
+    The  new  "metadata  copy-up"  feature  in overlayfs will avoid\r
+    copying   a   file's   contents   to   the  upper  layer  on  a\r
+    metadata-only change. See [52]this commit for details.\r
+    \r
+    Hardware support\r
+    \r
+    Graphics : Qualcomm Adreno A6xx GPUs.\r
+    \r
+    Industrial    I/O    :    Spreadtrum    SC27xx    series   PMIC\r
+    analog-to-digital    converters,    Analog    Devices    AD5758\r
+    digital-to-analog  converters, Intersil ISL29501 time-of-flight\r
+    sensors,  Silicon  Labs  SI1133  UV  index/ambient light sensor\r
+    chips, and Bosch Sensortec BME680 sensors.\r
+    \r
+    Miscellaneous   :  Generic  ADC-based  resistive  touchscreens,\r
+    Generic  ASIC  devices  via  the  Google [53]Gasket framework ,\r
+    Analog  Devices  ADGS1408/ADGS1409  multiplexers,  Actions Semi\r
+    Owl  SoCs  DMA  controllers,  MEN  16Z069 watchdog timers, Rohm\r
+    BU21029   touchscreen   controllers,   Cirrus   Logic  CS47L35,\r
+    CS47L85,  CS47L90,  and  CS47L91  codecs,  Cougar  500k  gaming\r
+    keyboards,   Qualcomm   GENI-based   I2C  controllers,  Actions\r
+    Semiconductor  Owl  I2C  controllers,  ChromeOS  EC-based USBPD\r
+    chargers, and Analog Devices ADP5061 battery chargers.\r
+    \r
+    USB  :  Nuvoton  NPCM7XX on-chip EHCI USB controllers, Broadcom\r
+    Stingray PCIe PHYs, and Renesas R-Car generation 3 PCIe PHYs.\r
+    \r
+    There  is  also  a  new  subsystem  for the abstraction of GNSS\r
+    (global  navigation  satellite  systems  —  GPS,  for  example)\r
+    receivers  in  the  kernel.  To  date,  such  devices have been\r
+    handled  with  an  abundance of user-space drivers; the hope is\r
+    to  bring  some  order  in  this  area.  Support for u-blox and\r
+    SiRFstar receivers has been added as well.\r
+    \r
+    Kernel internal\r
+    \r
+    The  __deprecated  marker,  used to mark interfaces that should\r
+    no  longer  be  used,  has been deprecated and removed from the\r
+    kernel  entirely.  [54]Torvalds  said  : " They are not useful.\r
+    They  annoy  everybody,  and  nobody  ever  does anything about\r
+    them,  because  it's  always 'somebody elses problem'. And when\r
+    people  start  thinking  that  warnings  are  normal, they stop\r
+    looking  at  them, and the real warnings that mean something go\r
+    unnoticed. "\r
+    \r
+    The  minimum  version  of  GCC  required by the kernel has been\r
+    moved up to 4.6.\r
+    \r
+    There  are  a  couple of significant changes that failed to get\r
+    in  this  time around, including the [55]XArray data structure.\r
+    The  patches are thought to be ready, but they had the bad luck\r
+    to  be  based  on  a  tree  that  failed to be merged for other\r
+    reasons,  so  Torvalds  [56]didn't even look at them . That, in\r
+    turn,   blocks  another  set  of  patches  intended  to  enable\r
+    migration of slab-allocated objects.\r
+    \r
+    The  other  big  deferral  is  the  [57]new system-call API for\r
+    filesystem  mounting  . Despite ongoing [58]concerns about what\r
+    happens  when  the  same  low-level  device is mounted multiple\r
+    times  with  conflicting  options,  Al  Viro  sent  [59]a  pull\r
+    request  to  send  this  work  upstream. The ensuing discussion\r
+    made  it  clear  that  there  is  still not a consensus in this\r
+    area,  though,  so  it  seems  that  this  work has to wait for\r
+    another cycle.\r
+    \r
+    Assuming  all  goes  well,  the  kernel will stabilize over the\r
+    coming  weeks  and  the  final  4.19  release  will  happen  in\r
+    mid-October.\r
+    \r
+    [60]Comments (1 posted)\r
+    \r
+    [61]Measuring (and fixing) I/O-controller throughput loss\r
+    \r
+    August 29, 2018\r
+    \r
+    This article was contributed by Paolo Valente\r
+    \r
+    Many  services,  from  web hosting and video streaming to cloud\r
+    storage,  need  to  move  data  to  and from storage. They also\r
+    often  require  that  each  per-client I/O flow be guaranteed a\r
+    non-zero   amount  of  bandwidth  and  a  bounded  latency.  An\r
+    expensive  way to provide these guarantees is to over-provision\r
+    storage  resources,  keeping  each  resource underutilized, and\r
+    thus  have  plenty of bandwidth available for the few I/O flows\r
+    dispatched  to  each  medium.  Alternatively one can use an I/O\r
+    controller.  Linux provides two mechanisms designed to throttle\r
+    some  I/O  streams  to allow others to meet their bandwidth and\r
+    latency  requirements.  These mechanisms work, but they come at\r
+    a  cost:  a  loss  of  as  much  as  80% of total available I/O\r
+    bandwidth.  I  have run some tests to demonstrate this problem;\r
+    some   upcoming  improvements  to  the  [62]bfq  I/O  scheduler\r
+    promise to improve the situation considerably.\r
+    \r
+    Throttling  does  guarantee control, even on drives that happen\r
+    to  be highly utilized but, as will be seen, it has a hard time\r
+    actually  ensuring  that  drives are highly utilized. Even with\r
+    greedy  I/O  flows,  throttling  easily  ends  up  utilizing as\r
+    little  as  20%  of the available speed of a flash-based drive.\r
+    Such   a  speed  loss  may  be  particularly  problematic  with\r
+    lower-end   storage.   On   the   opposite   end,  it  is  also\r
+    disappointing  with  high-end  hardware, as the Linux block I/O\r
+    stack  itself  has  been  [63]redesigned  from the ground up to\r
+    fully  utilize  the  high  speed  of  modern,  fast storage. In\r
+    addition,   throttling   fails   to   guarantee   the  expected\r
+    bandwidths  if  I/O  contains  both  reads  and  writes,  or is\r
+    sporadic in nature.\r
+    \r
+    On  the  bright  side,  there  now  seems  to  be  an effective\r
+    alternative  for controlling I/O: the proportional-share policy\r
+    provided  by  the  bfq  I/O  scheduler.  It enables nearly 100%\r
+    storage  bandwidth  utilization,  at  least  with  some  of the\r
+    workloads  that  are  problematic  for  throttling. An upcoming\r
+    version  of  bfq may be able to achieve this result with almost\r
+    all  workloads.  Finally,  bfq  guarantees  bandwidths with all\r
+    workloads.  The current limitation of bfq is that its execution\r
+    overhead  becomes  significant  at  speeds  above  400,000  I/O\r
+    operations per second on commodity CPUs.\r
+    \r
+    Using  the  bfq  I/O  scheduler,  Linux  can  now guarantee low\r
+    latency  to  lightweight  flows containing sporadic, short I/O.\r
+    No  throughput  issues arise, and no configuration is required.\r
+    This  capability benefits important, time-sensitive tasks, such\r
+    as  video  or audio streaming, as well as executing commands or\r
+    starting  applications.  Although  benchmarks are not available\r
+    yet,  these  guarantees  might  also  be  provided by the newly\r
+    proposed  [64]I/O latency controller . It allows administrators\r
+    to  set target latencies for I/O requests originating from each\r
+    group  of  processes,  and  favors  the  groups with the lowest\r
+    target latency.\r
+    \r
+    The testbed\r
+    \r
+    I  ran  the  tests with an ext4 filesystem mounted on a PLEXTOR\r
+    PX-256M5S  SSD,  which  features  a  peak rate of ~160MB/s with\r
+    random  I/O,  and  of  ~500MB/s  with  sequential  I/O.  I used\r
+    blk-mq,  in  Linux  4.18. The system was equipped with a 2.4GHz\r
+    Intel  Core  i7-2760QM  CPU  and  1.3GHz  DDR3  DRAM. In such a\r
+    system,  a  single  thread  doing  synchronous  reads reaches a\r
+    throughput of 23MB/s.\r
+    \r
+    For  the purposes of these tests, each process is considered to\r
+    be  in  one of two groups, termed "target" and "interferers". A\r
+    target  is  a  single-process,  I/O-bound  group  whose  I/O is\r
+    focused  on.  In  particular,  I  measure  the  I/O  throughput\r
+    enjoyed  by  this  group to get the minimum bandwidth delivered\r
+    to  the group. An interferer is single-process group whose role\r
+    is  to  generate additional I/O that interferes with the I/O of\r
+    the  target.  The  tested  workloads  contain  one  target  and\r
+    multiple interferers.\r
+    \r
+    The  single  process  in  each  group  either  reads or writes,\r
+    through  asynchronous  (buffered)  operations,  to  one  file —\r
+    different  from the file read or written by any other process —\r
+    after  invalidating  the  buffer cache for the file. I define a\r
+    reader  or  writer  process as either "random" or "sequential",\r
+    depending  on  whether  it  reads  or writes its file at random\r
+    positions  or  sequentially.  Finally, an interferer is defined\r
+    as  being either "active" or "inactive" depending on whether it\r
+    performs  I/O during the test. When an interferer is mentioned,\r
+    it is assumed that the interferer is active.\r
+    \r
+    Workloads  are  defined  so as to try to cover the combinations\r
+    that,  I believe, most influence the performance of the storage\r
+    device  and of the I/O policies. For brevity, in this article I\r
+    show results for only two groups of workloads:\r
+    \r
+    Static  sequential  :  four  synchronous  sequential readers or\r
+    four   asynchronous  sequential  writers,  plus  five  inactive\r
+    interferers.\r
+    \r
+    Static  random  :  four  synchronous random readers, all with a\r
+    block size equal to 4k, plus five inactive interferers.\r
+    \r
+    To  create  each  workload,  I  considered,  for  each  mix  of\r
+    interferers  in the group, two possibilities for the target: it\r
+    could  be  either  a random or a sequential synchronous reader.\r
+    In  [65]a  longer version of this article [PDF] , you will also\r
+    find   results  for  workloads  with  varying  degrees  of  I/O\r
+    randomness,  and for dynamic workloads (containing sporadic I/O\r
+    sources).  These extra results confirm the losses of throughput\r
+    and I/O control for throttling that are shown here.\r
+    \r
+    I/O policies\r
+    \r
+    Linux  provides  two I/O-control mechanisms for guaranteeing (a\r
+    minimum)  bandwidth, or at least fairness, to long-lived flows:\r
+    the   throttling  and  proportional-share  I/O  policies.  With\r
+    throttling,  one  can  set  a  maximum  bandwidth  limit — "max\r
+    limit"  for brevity — for the I/O of each group. Max limits can\r
+    be  used,  in an indirect way, to provide the service guarantee\r
+    at  the  focus  of  this  article.  For  example,  to guarantee\r
+    minimum  bandwidths  to  I/O flows, a group can be guaranteed a\r
+    minimum  bandwidth by limiting the maximum bandwidth of all the\r
+    other groups.\r
+    \r
+    Unfortunately,  max  limits  have  two  drawbacks  in  terms of\r
+    throughput.  First,  if  some groups do not use their allocated\r
+    bandwidth,  that  bandwidth cannot be reclaimed by other active\r
+    groups.  Second,  limits  must comply with the worst-case speed\r
+    of  the  device,  namely, its random-I/O peak rate. Such limits\r
+    will  clearly  leave  a lot of throughput unused with workloads\r
+    that  otherwise  would  drive  the  device to higher throughput\r
+    levels.  Maximizing  throughput  is  simply  not  a goal of max\r
+    limits.  So,  for brevity, test results with max limits are not\r
+    shown  here.  You  can find these results, plus a more detailed\r
+    description  of  the  above  drawbacks,  in the long version of\r
+    this article.\r
+    \r
+    Because  of  these  drawbacks,  a  new, still experimental, low\r
+    limit  has  been  added to the throttling policy. If a group is\r
+    assigned  a low limit, then the throttling policy automatically\r
+    limits  the  I/O of the other groups in such a way to guarantee\r
+    to  the  group  a  minimum  bandwidth equal to its assigned low\r
+    limit.  This  new  throttling  mechanism  throttles no group as\r
+    long  as  every  group is getting at least its assigned minimum\r
+    bandwidth.  I  tested  this mechanism, but did not consider the\r
+    interesting  problem  of guaranteeing minimum bandwidths while,\r
+    at the same time, enforcing maximum bandwidths.\r
+    \r
+    The  other  I/O  policy available in Linux, proportional share,\r
+    provides  weighted  fairness.  Each group is assigned a weight,\r
+    and   should   receive   a  portion  of  the  total  throughput\r
+    proportional  to  its  weight.  This  scheme guarantees minimum\r
+    bandwidths  in  the  same way that low limits do in throttling.\r
+    In  particular, it guarantees to each group a minimum bandwidth\r
+    equal  to  the  ratio  between the weight of the group, and the\r
+    sum  of the weights of all the groups that may be active at the\r
+    same time.\r
+    \r
+    The  actual implementation of the proportional-share policy, on\r
+    a  given drive, depends on what flavor of the block layer is in\r
+    use  for  that  drive.  If  the drive is using the legacy block\r
+    interface,  the policy is implemented by the cfq I/O scheduler.\r
+    Unfortunately,   cfq   fails   to   control   bandwidths   with\r
+    flash-based  storage,  especially  on  drives featuring command\r
+    queueing.  This  case  is  not  considered in these tests. With\r
+    drives  using  the  multiqueue interface, proportional share is\r
+    implemented  by  bfq. This is the combination considered in the\r
+    tests.\r
+    \r
+    To  benchmark  both  throttling  (low  limits) and proportional\r
+    share,  I  tested,  for  each workload, the combinations of I/O\r
+    policies  and  I/O  schedulers  reported in the table below. In\r
+    the  end,  there  are  three  test  cases for each workload. In\r
+    addition,  for some workloads, I considered two versions of bfq\r
+    for the proportional-share policy.\r
+    \r
+    Name\r
+    \r
+    I/O policy\r
+    \r
+    Scheduler\r
+    \r
+    Parameter for target\r
+    \r
+    Parameter for each of the four active interferers\r
+    \r
+    Parameter for each of the five inactive interferers\r
+    \r
+    Sum of parameters\r
+    \r
+    low-none\r
+    \r
+    Throttling with low limits\r
+    \r
+    none\r
+    \r
+    10MB/s\r
+    \r
+    10MB/s (tot: 40)\r
+    \r
+    20MB/s (tot: 100)\r
+    \r
+    150MB/s\r
+    \r
+    prop-bfq\r
+    \r
+    Proportional share\r
+    \r
+    bfq\r
+    \r
+    300\r
+    \r
+    100 (tot: 400)\r
+    \r
+    200 (tot: 1000)\r
+    \r
+    1700\r
+    \r
+    For  low  limits,  I  report  results with only none as the I/O\r
+    scheduler,  because  the  results  are  the same with kyber and\r
+    mq-deadline.\r
+    \r
+    The  capabilities of the storage medium and of low limits drove\r
+    the policy configurations. In particular:\r
+    \r
+    The  configuration  of the target and of the active interferers\r
+    for  low-none  is  the one for which low-none provides its best\r
+    possible  minimum-bandwidth  guarantee  to  the target: 10MB/s,\r
+    guaranteed  if  all interferers are readers. Results remain the\r
+    same  regardless of the values used for target latency and idle\r
+    time;  I  set them to 100µs and 1000µs, respectively, for every\r
+    group.\r
+    \r
+    Low  limits  for  inactive  interferers  are  set  to twice the\r
+    limits  for active interferers, to pose greater difficulties to\r
+    the policy.\r
+    \r
+    I  chose weights for prop-bfq so as to guarantee about the same\r
+    minimum  bandwidth  as  low-none  to  the  target,  in the same\r
+    only-reader  worst  case  as  for  low-none  and  to  preserve,\r
+    between  the  weights  of  active and inactive interferers, the\r
+    same  ratio  as  between  the low limits of active and inactive\r
+    interferers.\r
+    \r
+    Full  details  on  configurations  can  be  found  in  the long\r
+    version of this article.\r
+    \r
+    Each  workload  was  run  ten  times  for each policy, plus ten\r
+    times   without  any  I/O  control,  i.e.,  with  none  as  I/O\r
+    scheduler  and  no  I/O policy in use. For each run, I measured\r
+    the  I/O  throughput of the target (which reveals the bandwidth\r
+    provided  to  the target), the cumulative I/O throughput of the\r
+    interferers,  and  the  total  I/O throughput. These quantities\r
+    fluctuated  very  little  during  each  run,  as well as across\r
+    different  runs. Thus in the graphs I report only averages over\r
+    per-run  average throughputs. In particular, for the case of no\r
+    I/O  control,  I  report only the total I/O throughput, to give\r
+    an  idea of the throughput that can be reached without imposing\r
+    any control.\r
+    \r
+    Results\r
+    \r
+    This  plot  shows  throughput results for the simplest group of\r
+    workloads: the static-sequential set.\r
+    \r
+    With  a  random reader as the target against sequential readers\r
+    as  interferers,  low-none  does  guarantee  the configured low\r
+    limit   to  the  target.  Yet  it  reaches  only  a  low  total\r
+    throughput.  The  throughput  of  the  random  reader evidently\r
+    oscillates  around 10MB/s during the test. This implies that it\r
+    is  at least slightly below 10MB/s for a significant percentage\r
+    of  the  time.  But  when this happens, the low-limit mechanism\r
+    limits  the  maximum bandwidth of every active group to the low\r
+    limit  set  for the group, i.e., to just 10MB/s. The end result\r
+    is  a total throughput lower than 10% of the throughput reached\r
+    without I/O control.\r
+    \r
+    That  said, the high throughput achieved without I/O control is\r
+    obtained  by  choking  the random I/O of the target in favor of\r
+    the  sequential  I/O  of  the interferers. Thus, it is probably\r
+    more  interesting  to  compare  low-none  throughput  with  the\r
+    throughput  reachable while actually guaranteeing 10MB/s to the\r
+    target.  The  target  is  a single, synchronous, random reader,\r
+    which  reaches  23MB/s while active. So, to guarantee 10MB/s to\r
+    the  target,  it  is  enough  to serve it for about half of the\r
+    time,  and the interferers for the other half. Since the device\r
+    reaches  ~500MB/s  with  the sequential I/O of the interferers,\r
+    the  resulting  throughput  with  this  service scheme would be\r
+    (500+23)/2,  or  about 260MB/s. low-none thus reaches less than\r
+    20%  of  the total throughput that could be reached while still\r
+    preserving the target bandwidth.\r
+    \r
+    prop-bfq  provides the target with a slightly higher throughput\r
+    than  low-none.  This  makes  it harder for prop-bfq to reach a\r
+    high  total throughput, because prop-bfq serves more random I/O\r
+    (from  the target) than low-none. Nevertheless, prop-bfq gets a\r
+    much  higher  total  throughput than low-none. According to the\r
+    above  estimate,  this  throughput  is about 90% of the maximum\r
+    throughput  that  could  be reached, for this workload, without\r
+    violating  service  guarantees. The reason for this good result\r
+    is  that  bfq  provides  an  effective  implementation  of  the\r
+    proportional-share  service  policy.  At  any time, each active\r
+    group  is  granted  a fraction of the current total throughput,\r
+    and  the  sum  of  these  fractions  is  equal to one; so group\r
+    bandwidths  naturally  saturate  the available total throughput\r
+    at all times.\r
+    \r
+    Things  change  with  the  second  workload:  a  random  reader\r
+    against  sequential writers. Now low-none reaches a much higher\r
+    total  throughput  than  prop-bfq.  low-none  serves  much more\r
+    sequential  (write)  I/O  than  prop-bfq because writes somehow\r
+    break  the  low-limit  mechanisms and prevail over the reads of\r
+    the  target.  Conceivably,  this happens because writes tend to\r
+    both  starve  reads  in  the OS (mainly by eating all available\r
+    I/O  tags)  and to cheat on their completion time in the drive.\r
+    In  contrast,  bfq  is  intentionally  configured  to privilege\r
+    reads, to counter these issues.\r
+    \r
+    In  particular, low-none gets an even higher throughput than no\r
+    I/O  control  at all because it penalizes the random I/O of the\r
+    target even more than the no-controller configuration.\r
+    \r
+    Finally,  with  the  last  two workloads, prop-bfq reaches even\r
+    higher  total  throughput  than  with the first two. It happens\r
+    because  the  target  also  does  sequential  I/O,  and serving\r
+    sequential  I/O  is  much  more  beneficial for throughput than\r
+    serving  random  I/O.  With  these  two  workloads,  the  total\r
+    throughput  is, respectively, close to or much higher than that\r
+    reached  without  I/O control. For the last workload, the total\r
+    throughput  is  much higher because, differently from none, bfq\r
+    privileges  reads  over  asynchronous writes, and reads yield a\r
+    higher  throughput  than  writes.  In  contrast, low-none still\r
+    gets  lower  or much lower throughput than prop-bfq, because of\r
+    the  same issues that hinder low-none throughput with the first\r
+    two workloads.\r
+    \r
+    As  for  bandwidth  guarantees,  with  readers  as  interferers\r
+    (third  workload),  prop-bfq,  as  expected, gives the target a\r
+    fraction  of  the  total throughput proportional to its weight.\r
+    bfq    approximates    perfect   proportional-share   bandwidth\r
+    distribution  among groups doing I/O of the same type (reads or\r
+    writes)  and  with  the  same  locality (sequential or random).\r
+    With  the last workload, prop-bfq gives much more throughput to\r
+    the  reader  than  to  all the interferers, because interferers\r
+    are asynchronous writers, and bfq privileges reads.\r
+    \r
+    The  second  group  of  workloads  (static random), is the one,\r
+    among   all   the  workloads  considered,  for  which  prop-bfq\r
+    performs worst. Results are shown below:\r
+    \r
+    This  chart reports results not only for mainline bfq, but also\r
+    for  an improved version of bfq which is currently under public\r
+    testing.  As  can  be  seen, with only random readers, prop-bfq\r
+    reaches  a  much  lower  total  throughput  than low-none. This\r
+    happens  because of the Achilles heel of the bfq I/O scheduler.\r
+    If  the  process  in  service  does  synchronous  I/O and has a\r
+    higher  weight  than  some  other process, then, to give strong\r
+    bandwidth   guarantees   to   that   process,   bfq  plugs  I/O\r
+    dispatching  every  time  the process temporarily stops issuing\r
+    I/O   requests.   In  this  respect,  processes  actually  have\r
+    differentiated  weights and do synchronous I/O in the workloads\r
+    tested.  So  bfq systematically performs I/O plugging for them.\r
+    Unfortunately,  this  plugging  empties  the internal queues of\r
+    the  drive, which kills throughput with random I/O. And the I/O\r
+    of all processes in these workloads is also random.\r
+    \r
+    The  situation  reverses  with  a  sequential reader as target.\r
+    Yet,  the most interesting results come from the new version of\r
+    bfq,  containing  small  changes  to  counter exactly the above\r
+    weakness.  This  version  recovers  most of the throughput loss\r
+    with  the  workload  made of only random I/O and more; with the\r
+    second  workload,  where  the target is a sequential reader, it\r
+    reaches about 3.7 times the total throughput of low-none.\r
+    \r
+    When  the main concern is the latency of flows containing short\r
+    I/O,  Linux seems now rather high performing, thanks to the bfq\r
+    I/O  scheduler  and  the  I/O  latency  controller.  But if the\r
+    requirement  is  to  provide  explicit bandwidth guarantees (or\r
+    just  fairness) to I/O flows, then one must be ready to give up\r
+    much  or most of the speed of the storage media. bfq helps with\r
+    some   workloads,   but  loses  most  of  the  throughput  with\r
+    workloads  consisting  of mostly random I/O. Fortunately, there\r
+    is  apparently  hope  for  much  better  performance  since  an\r
+    improvement,  still  under  development, seems to enable bfq to\r
+    reach a high throughput with all workloads tested so far.\r
+    \r
+    [  I  wish  to  thank  Vivek Goyal for enabling me to make this\r
+    article much more fair and sound.]\r
+    \r
+    [66]Comments (4 posted)\r
+    \r
+    [67]KDE's onboarding initiative, one year later\r
+    \r
+    August 24, 2018\r
+    \r
+    This article was contributed by Marta Rybczyńska\r
+    \r
+    [68]Akademy\r
+    \r
+    In  2017,  the  KDE  community  decided  on  [69]three goals to\r
+    concentrate  on  for  the  next  few  years.  One  of  them was\r
+    [70]streamlining   the  onboarding  of  new  contributors  (the\r
+    others  were  [71]improving usability and [72]privacy ). During\r
+    [73]Akademy  ,  the  yearly  KDE  conference  that  was held in\r
+    Vienna  in  August,  Neofytos Kolokotronis shared the status of\r
+    the  onboarding  goal,  the work done during the last year, and\r
+    further  plans.  While it is a complicated process in a project\r
+    as  big  and  diverse  as  KDE, numerous improvements have been\r
+    already made.\r
+    \r
+    Two  of the three KDE community goals were proposed by relative\r
+    newcomers.  Kolokotronis  was  one  of those, having joined the\r
+    [74]KDE  Promo  team  not  long  before  proposing the focus on\r
+    onboarding.  He  had  previously  been involved with [75]Chakra\r
+    Linux  ,  a  distribution  based on KDE software. The fact that\r
+    new  members of the community proposed strategic goals was also\r
+    noted in the [76]Sunday keynote by Claudia Garad .\r
+    \r
+    Proper  onboarding  adds excitement to the contribution process\r
+    and  increases retention, he explained. When we look at [77]the\r
+    definition  of  onboarding  ,  it is a process in which the new\r
+    contributors  acquire  knowledge, skills, and behaviors so that\r
+    they  can  contribute effectively. Kolokotronis proposed to see\r
+    it  also  as  socialization:  integration  into  the  project's\r
+    relationships, culture, structure, and procedures.\r
+    \r
+    The  gains  from  proper  onboarding  are many. The project can\r
+    grow   by  attracting  new  blood  with  new  perspectives  and\r
+    solutions.   The  community  maintains  its  health  and  stays\r
+    vibrant.  Another  important  advantage of efficient onboarding\r
+    is  that  replacing  current  contributors  becomes easier when\r
+    they  change interests, jobs, or leave the project for whatever\r
+    reason.  Finally,  successful  onboarding adds new advocates to\r
+    the project.\r
+    \r
+    Achievements so far and future plans\r
+    \r
+    The  team  started  with  ideas  for  a  centralized onboarding\r
+    process  for the whole of KDE. They found out quickly that this\r
+    would  not  work  because KDE is "very decentralized", so it is\r
+    hard  to  provide  tools  and procedures that are going to work\r
+    for   the  whole  project.  According  to  Kolokotronis,  other\r
+    characteristics   of   KDE  that  impact  onboarding  are  high\r
+    diversity,   remote   and   online   teams,   and  hundreds  of\r
+    contributors  in dozens of projects and teams. In addition, new\r
+    contributors  already know in which area they want to take part\r
+    and  they  prefer  specific  information  that will be directly\r
+    useful for them.\r
+    \r
+    So  the  team  changed its approach; several changes have since\r
+    been  proposed  and  implemented.  The  [78]Get  Involved page,\r
+    which  is  expected to be one of the resources new contributors\r
+    read  first, has been rewritten. For the [79]Junior Jobs page ,\r
+    the  team  is  [80] [81]discussing what the generic content for\r
+    KDE  as  a whole should be. The team simplified [82]Phabricator\r
+    registration  ,  which  resulted  in  documenting  the  process\r
+    better.  Another part of the work includes the [83]KDE Bugzilla\r
+    ;  it  includes, for example initiatives to limit the number of\r
+    states of a ticket or remove obsolete products.\r
+    \r
+    The   [84]Plasma   Mobile  team  is  heavily  involved  in  the\r
+    onboarding  goal.  The Plasma Mobile developers have simplified\r
+    their    development   environment   setup   and   created   an\r
+    [85]interactive  "Get  Involved"  page. In addition, the Plasma\r
+    team  changed  the  way task descriptions are written; they now\r
+    contain  more detail, so that it is easier to get involved. The\r
+    basic  description  should  be  short  and clear, and it should\r
+    include  details  of  the  problem  and possible solutions. The\r
+    developers  try  to  share  the  list  of  skills  necessary to\r
+    fulfill  the  tasks  and  include  clear links to the technical\r
+    resources needed.\r
+    \r
+    Kolokotronis  and  team  also identified a new potential source\r
+    of  contributors  for  KDE:  distributions using KDE. They have\r
+    the  advantage  of  already knowing and using the software. The\r
+    next  idea  the team is working on is to make sure that setting\r
+    up  a  development  environment is easy. The team plans to work\r
+    on this during a dedicated sprint this autumn.\r
+    \r
+    Searching for new contributors\r
+    \r
+    Kolokotronis  plans  to  search  for  new  contributors  at the\r
+    periphery  of  the  project,  among  the "skilled enthusiasts":\r
+    loyal  users  who  actually  care  about the project. They "can\r
+    make  wonders",  he  said.  Those  individuals may be also less\r
+    confident  or  shy,  have  troubles  making the first step, and\r
+    need  guidance.  The  project  leaders  should  take  that into\r
+    account.\r
+    \r
+    In   addition,   newcomers   are  all  different.  Kolokotronis\r
+    provided  a  long  list  of  how contributors differ, including\r
+    skills  and  knowledge,  motives  and  interests,  and time and\r
+    dedication.  His  advice  is to "try to find their superpower",\r
+    the  skills  they  have  that  are  missing  in the team. Those\r
+    "superpowers" can then be used for the benefit of the project.\r
+    \r
+    If  a project does nothing else, he said, it can start with its\r
+    documentation.   However,   this   does   not  only  mean  code\r
+    documentation.  Writing  down  the  procedures  or  information\r
+    about  the internal work of the project, like who is working on\r
+    what,  is  an  important  part of a project's documentation and\r
+    helps  newcomers.  There  should  be  also guidelines on how to\r
+    start, especially setting up the development environment.\r
+    \r
+    The  first  thing  the  project leaders should do, according to\r
+    Kolokotronis,  is to spend time on introducing newcomers to the\r
+    project.  Ideally  every  new  contributor  should  be assigned\r
+    mentors  —  more  experienced  members  who  can help them when\r
+    needed.  The mentors and project leaders should find tasks that\r
+    are   interesting   for  each  person.  Answering  an  audience\r
+    question   on   suggestions   for   shy  new  contributors,  he\r
+    recommended  even  more  mentoring.  It is also very helpful to\r
+    make  sure  that  newcomers  have  enough  to  read, but "avoid\r
+    RTFM",  he  highlighted.  It is also easy for a new contributor\r
+    "to  fly  away",  he  said.  The solution is to keep requesting\r
+    things and be proactive.\r
+    \r
+    What the project can do?\r
+    \r
+    Kolokotronis  suggested  a number of actions for a project when\r
+    it   wants  to  improve  its  onboarding.  The  first  step  is\r
+    preparation:  the  project  leaders  should know the team's and\r
+    the  project's  needs. Long-term planning is important, too. It\r
+    is  not  enough  to wait for contributors to come — the project\r
+    should  be  proactive,  which means reaching out to candidates,\r
+    suggesting   appropriate  tasks  and,  finally,  making  people\r
+    available for the newcomers if they need help.\r
+    \r
+    This  leads to next step: to be a mentor. Kolokotronis suggests\r
+    being  a  "great  host",  but  also  trying  to  phase  out the\r
+    dependency   on   the   mentor   rapidly.  "We  have  been  all\r
+    newcomers",  he  said.  It  can  be  intimidating  to  join  an\r
+    existing  group. Onboarding creates a sense of belonging which,\r
+    in turn, increases retention.\r
+    \r
+    The  last  step  proposed  was  to  be strategic. This includes\r
+    thinking  about  the  emotions  you  want  newcomers  to  feel.\r
+    Kolokotronis  explained the strategic part with an example. The\r
+    overall   goal   is   (surprise!)  improve  onboarding  of  new\r
+    contributors.  An  intermediate  objective might be to keep the\r
+    newcomers  after  they  have  made  their first commit. If your\r
+    strategy  is  to  keep  them  confident  and proud, you can use\r
+    different  tactics  like  praise and acknowledgment of the work\r
+    in  public.  Another  useful  tactic  may  be  assigning simple\r
+    tasks, according to the skill of the contributor.\r
+    \r
+    To   summarize,   the   most   important  thing,  according  to\r
+    Kolokotronis,  is  to  respond  quickly and spend time with new\r
+    contributors.  This  time should be used to explain procedures,\r
+    and  to  introduce the people and culture. It is also essential\r
+    to  guide  first  contributions  and praise contributor's skill\r
+    and  effort. Increase the difficulty of tasks over time to keep\r
+    contributors  motivated  and  challenged. And finally, he said,\r
+    "turn them into mentors".\r
+    \r
+    Kolokotronis  acknowledges  that  onboarding  "takes  time" and\r
+    "everyone  complains"  about  it. However, he is convinced that\r
+    it  is  beneficial  in  the  long  term  and  that it decreases\r
+    developer turnover.\r
+    \r
+    Advice to newcomers\r
+    \r
+    Kolokotronis  concluded  with some suggestions for newcomers to\r
+    a  project.  They  should  try  to be persistent and to not get\r
+    discouraged  when  something  goes  wrong. Building connections\r
+    from   the  very  beginning  is  helpful.  He  suggests  asking\r
+    questions  as  if you were already a member "and things will be\r
+    fine". However, accept criticism if it happens.\r
+    \r
+    One  of  the  next  actions  of  the onboarding team will be to\r
+    collect  feedback  from  newcomers and experienced contributors\r
+    to  see  if they agree on the ideas and processes introduced so\r
+    far.\r
+    \r
+    [86]Comments (none posted)\r
+    \r
+    [87]Sharing and archiving data sets with Dat\r
+    \r
+    August 27, 2018\r
+    \r
+    This article was contributed by Antoine Beaupré\r
+    \r
+    [88]Dat  is  a  new peer-to-peer protocol that uses some of the\r
+    concepts  of  [89]BitTorrent  and  Git.  Dat  primarily targets\r
+    researchers  and  open-data activists as it is a great tool for\r
+    sharing,  archiving, and cataloging large data sets. But it can\r
+    also  be  used to implement decentralized web applications in a\r
+    novel way.\r
+    \r
+    Dat quick primer\r
+    \r
+    Dat  is  written in JavaScript, so it can be installed with npm\r
+    ,  but there are [90]standalone binary builds and a [91]desktop\r
+    application  (as an AppImage). An [92]online viewer can be used\r
+    to  inspect data for those who do not want to install arbitrary\r
+    binaries on their computers.\r
+    \r
+    The  command-line  application  allows  basic  operations  like\r
+    downloading  existing  data sets and sharing your own. Dat uses\r
+    a  32-byte hex string that is an [93]ed25519 public key , which\r
+    is  is  used  to  discover  and  find  content  on the net. For\r
+    example, this will download some sample data:  $ dat clone \\r
+    \r
+    dat://778f8d955175c92e4ced5e4f5563f69bfec0c86cc6f670352c457943-\r
+    666fe639 \\r
+    \r
+    ~/Downloads/dat-demo\r
+    \r
+    Similarly,  the  share  command  is  used  to share content. It\r
+    indexes  the  files  in  a  given  directory  and creates a new\r
+    unique  address  like the one above. The share command starts a\r
+    server  that uses multiple discovery mechanisms (currently, the\r
+    [94]Mainline  Distributed  Hash  Table  (DHT), a [95]custom DNS\r
+    server  ,  and  multicast  DNS)  to announce the content to its\r
+    peers.  This  is  how another user, armed with that public key,\r
+    can  download  that  content with dat clone or mirror the files\r
+    continuously with dat sync .\r
+    \r
+    So  far,  this  looks  a  lot  like BitTorrent [96]magnet links\r
+    updated  with 21st century cryptography. But Dat adds revisions\r
+    on  top  of  that,  so  modifications  are automatically shared\r
+    through  the  swarm.  That is important for public data sets as\r
+    those  are  often  dynamic  in  nature.  Revisions also make it\r
+    possible  to  use [97]Dat as a backup system by saving the data\r
+    incrementally using an [98]archiver .\r
+    \r
+    While  Dat  is designed to work on larger data sets, processing\r
+    them  for  sharing  may  take a while. For example, sharing the\r
+    Linux  kernel  source  code  required about five minutes as Dat\r
+    worked  on indexing all of the files. This is comparable to the\r
+    performance  offered by [99]IPFS and BitTorrent. Data sets with\r
+    more or larger files may take quite a bit more time.\r
+    \r
+    One  advantage  that  Dat  has  over  IPFS  is  that it doesn't\r
+    duplicate  the  data. When IPFS imports new data, it duplicates\r
+    the  files  into  ~/.ipfs . For collections of small files like\r
+    the  kernel,  this  is not a huge problem, but for larger files\r
+    like  videos  or  music,  it's  a  significant limitation. IPFS\r
+    eventually  implemented  a solution to this [100]problem in the\r
+    form  of the experimental [101]filestore feature , but it's not\r
+    enabled  by  default.  Even  with that feature enabled, though,\r
+    changes   to  data  sets  are  not  automatically  tracked.  In\r
+    comparison,  Dat  operation on dynamic data feels much lighter.\r
+    The downside is that each set needs its own dat share process.\r
+    \r
+    Like  any  peer-to-peer  system, Dat needs at least one peer to\r
+    stay  online  to  offer  the  content, which is impractical for\r
+    mobile  devices. Hosting providers like [102]Hashbase (which is\r
+    a  [103]pinning  service  in  Dat  jargon)  can help users keep\r
+    content  online  without  running  their  own [104]server . The\r
+    closest   parallel  in  the  traditional  web  ecosystem  would\r
+    probably   be  content  distribution  networks  (CDN)  although\r
+    pinning    services    are   not   necessarily   geographically\r
+    distributed  and  a  CDN does not necessarily retain a complete\r
+    copy of a website.  [105]\r
+    \r
+    A  web  browser called [106]Beaker , based on the [107]Electron\r
+    framework,  can  access  Dat  content  natively  without  going\r
+    through  a pinning service. Furthermore, Beaker is essential to\r
+    get   any   of  the  [108]Dat  applications  working,  as  they\r
+    fundamentally  rely  on  dat://  URLs  to  do their magic. This\r
+    means  that  Dat  applications won't work for most users unless\r
+    they  install that special web browser. There is a [109]Firefox\r
+    extension  called " [110]dat-fox " for people who don't want to\r
+    install  yet  another  browser,  but  it  requires installing a\r
+    [111]helper  program  .  The  extension  will  be  able to load\r
+    dat://  URLs  but  many  applications  will still not work. For\r
+    example,  the  [112]photo  gallery application completely fails\r
+    with dat-fox.\r
+    \r
+    Dat-based  applications  look promising from a privacy point of\r
+    view.  Because of its peer-to-peer nature, users regain control\r
+    over  where their data is stored: either on their own computer,\r
+    an  online server, or by a trusted third party. But considering\r
+    the  protocol  is not well established in current web browsers,\r
+    I  foresee  difficulties  in adoption of that aspect of the Dat\r
+    ecosystem.  Beyond  that,  it  is rather disappointing that Dat\r
+    applications  cannot  run  natively in a web browser given that\r
+    JavaScript is designed exactly for that.\r
+    \r
+    Dat privacy\r
+    \r
+    An  advantage  Dat  has  over other peer-to-peer protocols like\r
+    BitTorrent   is   end-to-end   encryption.   I  was  originally\r
+    concerned   by   the   encryption   design   when  reading  the\r
+    [113]academic paper [PDF] :\r
+    \r
+    It  is  up  to  client programs to make design decisions around\r
+    which  discovery  networks  they  trust.  For  example if a Dat\r
+    client  decides  to  use  the BitTorrent DHT to discover peers,\r
+    and  they  are  searching for a publicly shared Dat key (e.g. a\r
+    key  cited publicly in a published scientific paper) with known\r
+    contents,  then because of the privacy design of the BitTorrent\r
+    DHT  it  becomes  public  knowledge  what  key  that  client is\r
+    searching for.\r
+    \r
+    So  in  other  words, to share a secret file with another user,\r
+    the  public key is transmitted over a secure side-channel, only\r
+    to  then  leak  during  the discovery process. Fortunately, the\r
+    public  Dat  key is not directly used during discovery as it is\r
+    [114]hashed  with  BLAKE2B  .  Still, the security model of Dat\r
+    assumes   the   public  key  is  private,  which  is  a  rather\r
+    counterintuitive  concept  that  might upset cryptographers and\r
+    confuse  users  who  are  frequently  encouraged  to  type such\r
+    strings  in  address bars and search engines as part of the Dat\r
+    experience.  There  is a [115]security & privacy FAQ in the Dat\r
+    documentation warning about this problem:\r
+    \r
+    One  of  the key elements of Dat privacy is that the public key\r
+    is  never  used  in  any  discovery  network. The public key is\r
+    hashed,  creating  the discovery key. Whenever peers attempt to\r
+    connect to each other, they use the discovery key.\r
+    \r
+    Data  is  encrypted  using  the  public key, so it is important\r
+    that this key stays secure.\r
+    \r
+    There  are  other  privacy  issues outlined in the document; it\r
+    states that " Dat faces similar privacy risks as BitTorrent ":\r
+    \r
+    When  you download a dataset, your IP address is exposed to the\r
+    users  sharing  that dataset. This may lead to honeypot servers\r
+    collecting  IP addresses, as we've seen in Bittorrent. However,\r
+    with  dataset  sharing we can create a web of trust model where\r
+    specific  institutions  are  trusted  as  primary  sources  for\r
+    datasets, diminishing the sharing of IP addresses.\r
+    \r
+    A  Dat  blog  post  refers to this issue as [116]reader privacy\r
+    and  it is, indeed, a sensitive issue in peer-to-peer networks.\r
+    It  is  how  BitTorrent  users  are discovered and served scary\r
+    verbiage  from  lawyers, after all. But Dat makes this a little\r
+    better  because,  to  join  a swarm, you must know what you are\r
+    looking  for  already,  which means peers who can look at swarm\r
+    activity  only  include  users  who know the secret public key.\r
+    This  works  well  for  secret  content, but for larger, public\r
+    data  sets, it is a real problem; it is why the Dat project has\r
+    [117]avoided creating a Wikipedia mirror so far.\r
+    \r
+    I  found  another  privacy  issue that is not documented in the\r
+    security  FAQ  during  my  review of the protocol. As mentioned\r
+    earlier,  the [118]Dat discovery protocol routinely phones home\r
+    to  DNS  servers operated by the Dat project. This implies that\r
+    the  default  discovery  servers (and an attacker watching over\r
+    their  traffic)  know  who is publishing or seeking content, in\r
+    essence  discovering  the  "social  network"  behind  Dat. This\r
+    discovery  mechanism  can be disabled in clients, but a similar\r
+    privacy  issue  applies  to  the  DHT as well, although that is\r
+    distributed  so  it  doesn't  require  trust of the Dat project\r
+    itself.\r
+    \r
+    Considering  those  aspects  of the protocol, privacy-conscious\r
+    users  will  probably  want  to  use Tor or other anonymization\r
+    techniques to work around those concerns.\r
+    \r
+    The future of Dat\r
+    \r
+    [119]Dat  2.0  was  released  in  June  2017  with  performance\r
+    improvements   and   protocol   changes.  [120]Dat  Enhancement\r
+    Proposals  (DEPs)  guide the project's future development; most\r
+    work  is  currently  geared  toward  implementing  the  draft "\r
+    [121]multi-writer   proposal   "   in  [122]HyperDB  .  Without\r
+    multi-writer  support, only the original publisher of a Dat can\r
+    modify  it.  According  to  Joe  Hand, co-executive-director of\r
+    [123]Code  for  Science & Society (CSS) and Dat core developer,\r
+    in  an  IRC  chat, "supporting multiwriter is a big requirement\r
+    for  lots  of  folks". For example, while Dat might allow Alice\r
+    to  share  her  research  results with Bob, he cannot modify or\r
+    contribute  back  to  those results. The multi-writer extension\r
+    allows  for  Alice  to assign trust to Bob so he can have write\r
+    access to the data.\r
+    \r
+    Unfortunately,  the  current  proposal doesn't solve the " hard\r
+    problems  " of " conflict merges and secure key distribution ".\r
+    The  former  will  be worked out through user interface tweaks,\r
+    but  the  latter  is  a  classic problem that security projects\r
+    have   typically   trouble  finding  solutions  for—Dat  is  no\r
+    exception.  How  will Alice securely trust Bob? The OpenPGP web\r
+    of  trust?  Hexadecimal  fingerprints  read over the phone? Dat\r
+    doesn't provide a magic solution to this problem.\r
+    \r
+    Another  thing limiting adoption is that Dat is not packaged in\r
+    any  distribution  that I could find (although I [124]requested\r
+    it  in  Debian  )  and,  considering the speed of change of the\r
+    JavaScript  ecosystem,  this  is  unlikely  to  change any time\r
+    soon.  A  [125]Rust  implementation  of  the  Dat  protocol has\r
+    started,  however,  which  might  be easier to package than the\r
+    multitude  of  [126]Node.js  modules. In terms of mobile device\r
+    support,  there is an experimental Android web browser with Dat\r
+    support  called  [127]Bunsen  , which somehow doesn't run on my\r
+    phone.  Some  adventurous  users  have  successfully run Dat in\r
+    [128]Termux  .  I  haven't  found an app running on iOS at this\r
+    point.\r
+    \r
+    Even  beyond  platform  support, distributed protocols like Dat\r
+    have  a  tough  slope  to climb against the virtual monopoly of\r
+    more  centralized  protocols,  so  it  remains  to  be seen how\r
+    popular  those  tools  will  be.  Hand says Dat is supported by\r
+    multiple  non-profit  organizations. Beyond CSS, [129]Blue Link\r
+    Labs  is working on the Beaker Browser as a self-funded startup\r
+    and  a  grass-roots  organization, [130]Digital Democracy , has\r
+    contributed  to  the  project.  The  [131]Internet  Archive has\r
+    [132]announced  a  collaboration  between  itself, CSS, and the\r
+    California  Digital  Library to launch a pilot project to see "\r
+    how   members  of  a  cooperative,  decentralized  network  can\r
+    leverage  shared  services  to  ensure  data preservation while\r
+    reducing storage costs and increasing replication counts ".\r
+    \r
+    Hand  said  adoption in academia has been "slow but steady" and\r
+    that  the [133]Dat in the Lab project has helped identify areas\r
+    that  could  help researchers adopt the project. Unfortunately,\r
+    as  is  the case with many free-software projects, he said that\r
+    "our  team is definitely a bit limited on bandwidth to push for\r
+    bigger  adoption".  Hand said that the project received a grant\r
+    from   [134]Mozilla   Open   Source   Support  to  improve  its\r
+    documentation, which will be a big help.\r
+    \r
+    Ultimately,   Dat   suffers   from  a  problem  common  to  all\r
+    peer-to-peer  applications,  which is naming. Dat addresses are\r
+    not  exactly  intuitive:  humans  do not remember strings of 64\r
+    hexadecimal  characters well. For this, Dat took a [135]similar\r
+    approach  to IPFS by using DNS TXT records and /.well-known URL\r
+    paths   to  bridge  existing,  human-readable  names  with  Dat\r
+    hashes.  So  this sacrifices a part of the decentralized nature\r
+    of the project in favor of usability.\r
+    \r
+    I  have  tested  a lot of distributed protocols like Dat in the\r
+    past  and I am not sure Dat is a clear winner. It certainly has\r
+    advantages  over IPFS in terms of usability and resource usage,\r
+    but  the  lack  of packages on most platforms is a big limit to\r
+    adoption  for  most  people. This means it will be difficult to\r
+    share  content  with  my  friends  and  family with Dat anytime\r
+    soon,  which  would  probably  be  my  primary use case for the\r
+    project.  Until  the  protocol  reaches the wider adoption that\r
+    BitTorrent  has  seen  in  terms  of  platform  support, I will\r
+    probably   wait   before  switching  everything  over  to  this\r
+    promising project.\r
+    \r
+    [136]Comments (11 posted)\r
+    \r
+    Page editor : Jonathan Corbet\r
+    \r
+    Inside this week's LWN.net Weekly Edition\r
+    \r
+    [137]Briefs  :  OpenSSH  7.8;  4.19-rc1;  Which stable?; Netdev\r
+    0x12; Bison 3.1; Quotes; ...\r
+    \r
+    [138]Announcements  :  Newsletters;  events;  security updates;\r
+    kernel patches; ...  Next page : [139]Brief items>>\r
+    \r
+    \r
+    \r
+    [1] https://lwn.net/Articles/763743/\r
+    \r
+    [2] https://lwn.net/Articles/763626/\r
+    \r
+    [3] https://lwn.net/Articles/763641/\r
+    \r
+    [4] https://lwn.net/Articles/763106/\r
+    \r
+    [5] https://lwn.net/Articles/763603/\r
+    \r
+    [6] https://lwn.net/Articles/763175/\r
+    \r
+    [7] https://lwn.net/Articles/763492/\r
+    \r
+    [8] https://lwn.net/Articles/763254/\r
+    \r
+    [9] https://lwn.net/Articles/763255/\r
+    \r
+    [10] https://lwn.net/Articles/763743/#Comments\r
+    \r
+    [11] https://lwn.net/Articles/763626/\r
+    \r
+    [12] http://julialang.org/\r
+    \r
+    [13] https://julialang.org/blog/2018/08/one-point-zero\r
+    \r
+    [14] https://julialang.org/benchmarks/\r
+    \r
+    [15] https://juliacomputing.com/\r
+    \r
+    [16] https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93p-\r
+    rint_loop\r
+    \r
+    [17] http://llvm.org/\r
+    \r
+    [18] http://www.3blue1brown.com/essence-of-linear-algebra-page/\r
+    \r
+    [19] http://www.netlib.org/lapack/\r
+    \r
+    [20] https://lwn.net/Articles/657157/\r
+    \r
+    [21] https://julialang.org/publications/julia-fresh-approach-B-\r
+    EKS.pdf\r
+    \r
+    [22] https://lwn.net/Articles/738915/\r
+    \r
+    [23] https://pypy.org/\r
+    \r
+    [24] https://github.com/JuliaPy/PyCall.jl\r
+    \r
+    [25] https://github.com/JuliaInterop/RCall.jl\r
+    \r
+    [26] https://docs.julialang.org/en/stable/\r
+    \r
+    [27] https://julialang.org/learning/\r
+    \r
+    [28] http://bogumilkaminski.pl/files/julia_express.pdf\r
+    \r
+    [29] https://docs.julialang.org/en/stable/manual/noteworthy-di-\r
+    fferences/#Noteworthy-differences-from-Python-1\r
+    \r
+    [30] https://lwn.net/Articles/746386/\r
+    \r
+    [31] https://github.com/JuliaLang/IJulia.jl\r
+    \r
+    [32] https://lwn.net/Articles/764001/\r
+    \r
+    [33] https://lwn.net/Articles/763626/#Comments\r
+    \r
+    [34] https://lwn.net/Articles/763641/\r
+    \r
+    [35] https://lwn.net/Archives/ConferenceByYear/#2018-Linux_Sec-\r
+    urity_Summit_NA\r
+    \r
+    [36]  https://events.linuxfoundation.org/events/linux-security-\r
+    summit-north-america-2018/\r
+    \r
+    [37] https://kernsec.org/wiki/index.php/Kernel_Self_Protection-\r
+    _Project\r
+    \r
+    [38] https://lwn.net/Articles/763644/\r
+    \r
+    [39] https://raphlinus.github.io/programming/rust/2018/08/17/u-\r
+    ndefined-behavior.html\r
+    \r
+    [40] https://lwn.net/Articles/749064/\r
+    \r
+    [41] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/-\r
+    linux.git/commit/?id=02361bc77888\r
+    \r
+    [42] https://lore.kernel.org/lkml/CA+55aFzCG-zNmZwX4A2FQpadafL-\r
+    fEzK6CC=qPXydAacU1RqZWA@mail.gmail.com/T/#u\r
+    \r
+    [43] https://lwn.net/Articles/758245/\r
+    \r
+    [44] https://lwn.net/Articles/718888/\r
+    \r
+    [45] https://lwn.net/Articles/744507/\r
+    \r
+    [46] https://outflux.net/slides/2018/lss/danger.pdf\r
+    \r
+    [47] https://lwn.net/Articles/763641/#Comments\r
+    \r
+    [48] https://lwn.net/Articles/763106/\r
+    \r
+    [49] https://lwn.net/Articles/763497/\r
+    \r
+    [50] https://lwn.net/Articles/762566/\r
+    \r
+    [51] https://lwn.net/Articles/761118/\r
+    \r
+    [52] https://git.kernel.org/linus/d5791044d2e5749ef4de84161cec-\r
+    5532e2111540\r
+    \r
+    [53] https://lwn.net/ml/linux-kernel/20180630000253.70103-1-sq-\r
+    ue@chromium.org/\r
+    \r
+    [54] https://git.kernel.org/linus/771c035372a036f83353eef46dbb-\r
+    829780330234\r
+    \r
+    [55] https://lwn.net/Articles/745073/\r
+    \r
+    [56] https://lwn.net/ml/linux-kernel/CA+55aFxFjAmrFpwQmEHCthHO-\r
+    zgidCKnod+cNDEE+3Spu9o1s3w@mail.gmail.com/\r
+    \r
+    [57] https://lwn.net/Articles/759499/\r
+    \r
+    [58] https://lwn.net/Articles/762355/\r
+    \r
+    [59] https://lwn.net/ml/linux-fsdevel/20180823223145.GK6515@Ze-\r
+    nIV.linux.org.uk/\r
+    \r
+    [60] https://lwn.net/Articles/763106/#Comments\r
+    \r
+    [61] https://lwn.net/Articles/763603/\r
+    \r
+    [62] https://lwn.net/Articles/601799/\r
+    \r
+    [63] https://lwn.net/Articles/552904\r
+    \r
+    [64] https://lwn.net/Articles/758963/\r
+    \r
+    [65] http://algogroup.unimore.it/people/paolo/pub-docs/extende-\r
+    d-lat-bw-throughput.pdf\r
+    \r
+    [66] https://lwn.net/Articles/763603/#Comments\r
+    \r
+    [67] https://lwn.net/Articles/763175/\r
+    \r
+    [68] https://lwn.net/Archives/ConferenceByYear/#2018-Akademy\r
+    \r
+    [69] https://dot.kde.org/2017/11/30/kdes-goals-2018-and-beyond\r
+    \r
+    [70] https://phabricator.kde.org/T7116\r
+    \r
+    [71] https://phabricator.kde.org/T6831\r
+    \r
+    [72] https://phabricator.kde.org/T7050\r
+    \r
+    [73] https://akademy.kde.org/\r
+    \r
+    [74] https://community.kde.org/Promo\r
+    \r
+    [75] https://www.chakralinux.org/\r
+    \r
+    [76] https://conf.kde.org/en/Akademy2018/public/events/79\r
+    \r
+    [77] https://en.wikipedia.org/wiki/Onboarding\r
+    \r
+    [78] https://community.kde.org/Get_Involved\r
+    \r
+    [79] https://community.kde.org/KDE/Junior_Jobs\r
+    \r
+    [80] https://lwn.net/Articles/763189/\r
+    \r
+    [81] https://phabricator.kde.org/T8686\r
+    \r
+    [82] https://phabricator.kde.org/T7646\r
+    \r
+    [83] https://bugs.kde.org/\r
+    \r
+    [84] https://www.plasma-mobile.org/index.html\r
+    \r
+    [85] https://www.plasma-mobile.org/findyourway\r
+    \r
+    [86] https://lwn.net/Articles/763175/#Comments\r
+    \r
+    [87] https://lwn.net/Articles/763492/\r
+    \r
+    [88] https://datproject.org\r
+    \r
+    [89] https://www.bittorrent.com/\r
+    \r
+    [90] https://github.com/datproject/dat/releases\r
+    \r
+    [91] https://docs.datproject.org/install\r
+    \r
+    [92] https://datbase.org/\r
+    \r
+    [93] https://ed25519.cr.yp.to/\r
+    \r
+    [94] https://en.wikipedia.org/wiki/Mainline_DHT\r
+    \r
+    [95] https://github.com/mafintosh/dns-discovery\r
+    \r
+    [96] https://en.wikipedia.org/wiki/Magnet_URI_scheme\r
+    \r
+    [97] https://blog.datproject.org/2017/10/13/using-dat-for-auto-\r
+    matic-file-backups/\r
+    \r
+    [98] https://github.com/mafintosh/hypercore-archiver\r
+    \r
+    [99] https://ipfs.io/\r
+    \r
+    [100] https://github.com/ipfs/go-ipfs/issues/875\r
+    \r
+    [101] https://github.com/ipfs/go-ipfs/blob/master/docs/experim-\r
+    ental-features.md#ipfs-filestore\r
+    \r
+    [102] https://hashbase.io/\r
+    \r
+    [103] https://github.com/datprotocol/DEPs/blob/master/proposal-\r
+    s/0003-http-pinning-service-api.md\r
+    \r
+    [104] https://docs.datproject.org/server\r
+    \r
+    [105] https://lwn.net/Articles/763544/\r
+    \r
+    [106] https://beakerbrowser.com/\r
+    \r
+    [107] https://electronjs.org/\r
+    \r
+    [108] https://github.com/beakerbrowser/explore\r
+    \r
+    [109] https://addons.mozilla.org/en-US/firefox/addon/dat-p2p-p-\r
+    rotocol/\r
+    \r
+    [110] https://github.com/sammacbeth/dat-fox\r
+    \r
+    [111] https://github.com/sammacbeth/dat-fox-helper\r
+    \r
+    [112] https://github.com/beakerbrowser/dat-photos-app\r
+    \r
+    [113] https://github.com/datproject/docs/raw/master/papers/dat-\r
+    paper.pdf\r
+    \r
+    [114] https://github.com/datprotocol/DEPs/blob/653e0cf40233b5d-\r
+    474cddc04235577d9d55b2934/proposals/0000-peer-discovery.md#dis-\r
+    covery-keys\r
+    \r
+    [115] https://docs.datproject.org/security\r
+    \r
+    [116] https://blog.datproject.org/2016/12/12/reader-privacy-on-\r
+    the-p2p-web/\r
+    \r
+    [117] https://blog.datproject.org/2017/12/10/dont-ship/\r
+    \r
+    [118] https://github.com/datprotocol/DEPs/pull/7\r
+    \r
+    [119] https://blog.datproject.org/2017/06/01/dat-sleep-release/\r
+    \r
+    [120] https://github.com/datprotocol/DEPs\r
+    \r
+    [121] https://github.com/datprotocol/DEPs/blob/master/proposal-\r
+    s/0008-multiwriter.md\r
+    \r
+    [122] https://github.com/mafintosh/hyperdb\r
+    \r
+    [123] https://codeforscience.org/\r
+    \r
+    [124] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890565\r
+    \r
+    [125] https://github.com/datrs\r
+    \r
+    [126] https://nodejs.org/en/\r
+    \r
+    [127] https://bunsenbrowser.github.io/#!index.md\r
+    \r
+    [128] https://termux.com/\r
+    \r
+    [129] https://bluelinklabs.com/\r
+    \r
+    [130] https://www.digital-democracy.org/\r
+    \r
+    [131] https://archive.org\r
+    \r
+    [132] https://blog.archive.org/2018/06/05/internet-archive-cod-\r
+    e-for-science-and-society-and-california-digital-library-to-pa-\r
+    rtner-on-a-data-sharing-and-preservation-pilot-project/\r
+    \r
+    [133] https://github.com/codeforscience/Dat-in-the-Lab\r
+    \r
+    [134] https://www.mozilla.org/en-US/moss/\r
+    \r
+    [135] https://github.com/datprotocol/DEPs/blob/master/proposal-\r
+    s/0005-dns.md\r
+    \r
+    [136] https://lwn.net/Articles/763492/#Comments\r
+    \r
+    [137] https://lwn.net/Articles/763254/\r
+    \r
+    [138] https://lwn.net/Articles/763255/\r
+    \r
+    [139] https://lwn.net/Articles/763254/\r
+\r
+\r
+\r