From ea67b53df08320008468ad723cbe23a2797547ab Mon Sep 17 00:00:00 2001
From: Pavel 'LEdoian' Turinsky <ledoian@matfyz.cz>
Date: Tue, 18 Jul 2023 16:12:04 +0200
Subject: [PATCH] Write some ch3 lol

---
 en/chap03.tex | 167 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 162 insertions(+), 5 deletions(-)

diff --git a/en/chap03.tex b/en/chap03.tex
index 2e01327..34c1756 100644
--- a/en/chap03.tex
+++ b/en/chap03.tex
@@ -24,18 +24,175 @@ topology on the screen, highlighting the relevant information.
 
 \section{Recurring and general patterns}
 
-\XXX{dictionaries everywhere, hashable recipes. ospffile with comments as a reusable format. Format of VertexID}
+Birdvisu's data structures make heavy use of dictionaries and sets, because we
+do not handle much data that would need to be processed in any particular
+order. While this allows us to perform set operations quickly, it requires us
+to provide hashable keys.
+
+We have decided to embrace this requirement and use rather complex frozen
+dataclasses, which can hold as much of the required data as possible, as long
+as we can re-create that data.
+
+This can be illustrated on our usage of vertices in topology. There are two
+objects: a VertexID, and the Vertex itself. VertexID is the hashable part and
+Vertex provides additional information, like incident edges, which are not
+hashable. The topology then has a dictionary from the VertexIDs to Vertices,
+providing complete data.
+
+However, the VertexID already contains information like what version of IP it
+belongs in, whether it represents a router and all the possible IP addresses
+and identifiers related to the vertex. It is sufficient for Vertex objects to
+only contain sets of edges and references to the related topology and VertexID.
+(In the next section, we will see that a type of the vertex is also stored in
+Vertex, but that is really everything.)
+
+The other thing we decided to reuse, was the format of BIRD's topology output.
+We call the format \uv{ospffile} and extended it by allowing comments (after an
+octothorpe, i.e. \verb|#|). Also, empty lines do not seem to be of relevance.
+These are quality-of-life improvements for cases when ospffiles are edited by
+hand.
+
+Apart from storing topologies, we intend to use ospffiles for description of
+basic styles. Therefore, our implementation in \verb|birdvisu.ospffile| only
+constructs the tree of strings and does not try to understand it. Our module
+provides API similar to the one of \verb|json| or \verb|marshall| module, even
+though it cannot represent arbitrary types.
 
 \section{Data collection: providers and parsing}
 
-\XXX{sub-parts, why a topology is not a graph, stacking topologies with
-multiedges, fake-freezing, why is everything static. Graph representation, selection of BIRD's
-instance}
+This part of the project deals with processing topologies. The core object of
+this part is a TopologyV3\footnote{The \uv{V3} suffix is sometimes impractical
+to keep, so we will sometimes shorten the class name only to \uv{Topology}. It
+denotes the same object.}.  While the Topologies can be created manually by
+adding the vertices and edges, we expect that retrieving topologies from other
+sources like saved ospffiles or running BIRD processes. This is made possible
+by implementing a TopologyProvider.
+
+Representing a topology turns out to be a bit complicated problem for the following reasons:
+\begin{itemize}
+	\item The topology edges need to be directed. OSPF allows a shortest path
+		from A to B to be different to the other direction.
+	\item It can have a very general shape, so we cannot rely on common
+		patterns. For example, routers can be connected to other routers using
+		point-to-point or virtual links, not just networks.
+	\item The objects are shape-shifting. A transit network may become stub or
+		change the designated router and we want to be able to understand the
+		change as best as possible.
+	\item The topology is not necessarily a graph, because multiple links may
+		lead from a single router to the same network. However, we strongly
+		believe that the maximum number of parallel edges is quite low, so most
+		of the theory for simple graphs is still applicable.
+	\item For completeness, we note here again that the shortest paths from a
+		single vertex form a DAG, even though the OSPF specifications speak of
+		them as of trees. (Negative edges are, fortunately, not permitted.)
+\end{itemize}
+
+Given the above requirements and lessons learned in
+section~\ref{s:net-unusual}, we need to find a representation of vertices, that
+is both powerful enough to uniquely describe a particular vertex, and flexible
+to allow us easily detect its metamorphoses. The table~\ref{tab:vertexid}
+shows, which information we can use for each type of object. We see that
+networks in particular are hard to represent, because the ID of the DR may
+change and it might be the only distinguishing property in case of a split
+network.
+
+\bgroup
+\def\yes{\checkmark}
+\def\chg{$\bullet$}
+\begin{table}[h]
+	\centering
+	\begin{tabular}{cccccc}\hline
+		Object & Address & RID & DR ID & IF ID & Notes \\\hline
+		\verb|router| & -- & \yes & -- & -- &\\
+		\verb|xrouter| & -- & \yes & -- & -- &\\
+		\verb|vlink| & -- & \yes & -- & -- & Peer is a \verb|router|\\
+		\verb|network| & v2:\yes,v3:$*$ & -- & \chg & v3:\chg &\\
+		\verb|external| & \yes & -- & -- & -- &\\
+		\verb|xnetwork| & \yes & -- & -- & -- &\\
+		\verb|stubnet| & \yes & \yes & -- & -- &\\
+	\end{tabular}
+	\caption{Information determining each object of a topology. $*$ means it
+	may or may not be known, \chg\  denotes an attribute that may change. Columns in
+	order: whether it has assigned a address, relevant router ID, ID of designated router, interface number of the DR.}
+	\label{tab:vertexid}
+\end{table}
+\egroup
+
+We decided to aim for correctness, so whenever any of the attributes of an object
+change, we consider it to be a a separate object. This may create some false
+positives, but we think that is the better case than potential false negatives,
+which could hide some issues. Also, when the infrastructure works correctly,
+the designated router should only change in case of outage. Therefore, it might
+actually be useful to notice the user when a network has an unexpected
+designated router even when it is otherwise healthy. However, we provide a way
+to find objects by partial information, using the VertexFinder objects, so this
+allows heuristics to match different objects.
+
+The information mentioned in table~\ref{tab:vertexid} serves as the main part
+of the VertexID. However, we want the VertexID to identify the same object even
+after it transforms to another kind of object, so instead of using the object
+type, we only note whether the object is a router or a network, since this
+property stays the same even for changed objects. The code is also oblivious to
+the fact that the interface ID is a number and what it means -- we use it as an
+opaque \uv{discriminator} and do not even bother with parsing it from a string.
+
+The VertexIDs are supposed to be descriptors of objective vertex state, so they
+do not belong to any particular TopologyV3. Instead, they can be used to track
+actual Vertices across multiple Topologies.
+
+Apart from VertexIDs, the TopologyV3 also consists of the additional data in
+Vertex objects and Edges. The Vertex objects, as noted above, contain only a
+set of incoming and outgoing edges, references to their TopologyV3 and VertexID
+objects and the actual type of the object the vertex represents (i.e. the first
+column of the table).
+
+An Edge knows the source and target VertexID, its cost and the number of
+parallel edges with these properties. If the Edge was determined by a virtual
+link, it is marked as virtual. This is needed, because the both Vertices are
+regular routers, so the information about the virtual link cannot be stored in
+them. Note that an Edge does not need to belong to any Topology, since it only
+contains factual data. The information, whether an Edge is in the topology, is
+stored only in the incident Vertices.
+
+A Topology can be marked as \uv{frozen}. This denotes an intent that it really
+should not be modified, because other code might rely on the particular shape
+of the Topology. However, making the Topology trully immutable would be
+impractical in Python, so we opted for this approach. In case our solution
+turns out to be prone to accidental modification of the Topology, we will
+deploy additional countermeasures against that.
+
+Frozen Topologies also allow us to stack them, creating a union of the original
+Topologies. This way, a single Topology can be used in the visualisation, while
+keeping the original information. This mechanism is fully generic, but was
+mainly invented to allow merging the reference (expected) topology with the
+actual one (i.e. the current state of the system). The ancestors are stored by
+a string label in a dictionary of the Topology. While subclassing TopologyV3
+into a StackedTopology would probably be a cleaner design, since the only
+difference is a state of one dictionary, we did not employ this approach.
+
+The TopologyProviders are not very interesting, but are important nevertheless.
+There are a few caveats with parsing topologies from the ospffile format.
+First, the edges from routers to networks can only be resolved after the
+networks are known, since network's level-2 block contains information not
+present in the level-3 directive for the router (namely, the designated router
+for OSPFv2 natworks and the set of addresses for OSPFv3).
+
+Since BIRD may be running more than one instance of OSPF, the
+BirdSocketTopologyProvider contains an ad-hoc parser of the response to the
+\texttt{show protocols} command, which seems to be a reliable way to list
+running instances.
+
+Moreover, BIRD does not seem to expose any way to determine the version of
+OSPF. So far, we think it is sufficient to guess from the \texttt{network}
+directives, since they seem to contain a hyphen if and only if the dump is from
+an OSPFv3 instance. (The source code of BIRD suggests that under some
+circumstances, brackets can appear even in OSPFv2 dump, so that is not a
+possibility.)
 
 \section{Annotations}
 
 \XXX{scoping, annotator creation, advantages of storing data in annotations and
-not vertices. Annotator protocol and posibility of export. Various uses of annotators: enhancing, analysis, visualisation}
+not vertices. Annotator protocol and posibility of export. Various uses of annotators: enhancing, analysis, visualisation. heapq}
 
 \section{Visualisation}