\chapter{Design}\label{ch:design} We now explain the design of Birdvisu in depth. First, we explain some important decisions and present the overall structure of the project, then we look into individual parts of the program. Birdvisu is implemented in Python, using PySide6, the official bindings for Qt6, for drawing on screen. We decided to use Qt, because it provides a lot of pre-made widgets and tools and since it is widely used, it is easy to find help for it on the Internet. The decision to use Python was not hard either. Not only Qt has official bindings for it, but we use the language very often and thus are comfortable writing in it. We do not expect the potential slowness of Python to be an issue, because for handling graphics we are using Qt, which is written in C++. Also, as we have analysed in section~\ref{s:areas}, we expect the topologies to be quite small. The project comprises of three main parts: data collection, annotation and presentation part. The data collection part is tasked with finding out the current topology and creating a usable representation of such topologies and their combinations. In the annotation part, we add additional information to the topologies like the difference from the expectation or graph properties of the topology. Finally, when we have all the needed information, we draw the topology on the screen, highlighting the relevant information. \section{Recurring and general patterns} Birdvisu's data structures make heavy use of dictionaries and sets, because we do not handle much data that would need to be processed in any particular order. While this allows us to perform set operations quickly, it requires us to provide hashable keys. We have decided to embrace this requirement and use rather complex frozen dataclasses, which can hold as much of the required data as possible, as long as we can re-create that data. This can be illustrated on our usage of vertices in topology. There are two objects: a VertexID, and the Vertex itself. VertexID is the hashable part and Vertex provides additional information, like incident edges, which are not hashable. The topology then has a dictionary from the VertexIDs to Vertices, providing complete data. However, the VertexID already contains information like what version of IP it belongs in, whether it represents a router and all the possible IP addresses and identifiers related to the vertex. It is sufficient for Vertex objects to only contain sets of edges and references to the related topology and VertexID. (In the next section, we will see that a type of the vertex is also stored in Vertex, but that is really everything.) The other thing we decided to reuse, was the format of BIRD's topology output. We call the format \uv{ospffile} and extended it by allowing comments (after an octothorpe, i.e. \verb|#|). Also, empty lines do not seem to be of relevance. These are quality-of-life improvements for cases when ospffiles are edited by hand. Apart from storing topologies, we intend to use ospffiles for description of basic styles. Therefore, our implementation in \verb|birdvisu.ospffile| only constructs the tree of strings and does not try to understand it. Our module provides API similar to the one of \verb|json| or \verb|marshall| module, even though it cannot represent arbitrary types. \section{Data collection: providers and parsing} This part of the project deals with processing topologies. The core object of this part is a TopologyV3\footnote{The \uv{V3} suffix is sometimes impractical to keep, so we will sometimes shorten the class name only to \uv{Topology}. It denotes the same object.}. While the Topologies can be created manually by adding the vertices and edges, we expect that retrieving topologies from other sources like saved ospffiles or running BIRD processes. This is made possible by implementing a TopologyProvider. Representing a topology turns out to be a bit complicated problem for the following reasons: \begin{itemize} \item The topology edges need to be directed. OSPF allows a shortest path from A to B to be different to the other direction. \item It can have a very general shape, so we cannot rely on common patterns. For example, routers can be connected to other routers using point-to-point or virtual links, not just networks. \item The objects are shape-shifting. A transit network may become stub or change the designated router and we want to be able to understand the change as best as possible. \item The topology is not necessarily a graph, because multiple links may lead from a single router to the same network. However, we strongly believe that the maximum number of parallel edges is quite low, so most of the theory for simple graphs is still applicable. \item For completeness, we note here again that the shortest paths from a single vertex form a DAG, even though the OSPF specifications speak of them as of trees. (Negative edges are, fortunately, not permitted.) \end{itemize} Given the above requirements and lessons learned in section~\ref{s:net-unusual}, we need to find a representation of vertices, that is both powerful enough to uniquely describe a particular vertex, and flexible to allow us easily detect its metamorphoses. The table~\ref{tab:vertexid} shows, which information we can use for each type of object. We see that networks in particular are hard to represent, because the ID of the DR may change and it might be the only distinguishing property in case of a split network. \bgroup \def\yes{\checkmark} \def\chg{$\bullet$} \begin{table}[h] \centering \begin{tabular}{cccccc}\hline Object & Address & RID & DR ID & IF ID & Notes \\\hline \verb|router| & -- & \yes & -- & -- &\\ \verb|xrouter| & -- & \yes & -- & -- &\\ \verb|vlink| & -- & \yes & -- & -- & Peer is a \verb|router|\\ \verb|network| & v2:\yes,v3:$*$ & -- & \chg & v3:\chg &\\ \verb|external| & \yes & -- & -- & -- &\\ \verb|xnetwork| & \yes & -- & -- & -- &\\ \verb|stubnet| & \yes & \yes & -- & -- &\\ \end{tabular} \caption{Information determining each object of a topology. $*$ means it may or may not be known, \chg\ denotes an attribute that may change. Columns in order: whether it has assigned a address, relevant router ID, ID of designated router, interface number of the DR.} \label{tab:vertexid} \end{table} \egroup We decided to aim for correctness, so whenever any of the attributes of an object change, we consider it to be a a separate object. This may create some false positives, but we think that is the better case than potential false negatives, which could hide some issues. Also, when the infrastructure works correctly, the designated router should only change in case of outage. Therefore, it might actually be useful to notice the user when a network has an unexpected designated router even when it is otherwise healthy. However, we provide a way to find objects by partial information, using the VertexFinder objects, so this allows heuristics to match different objects. The information mentioned in table~\ref{tab:vertexid} serves as the main part of the VertexID. However, we want the VertexID to identify the same object even after it transforms to another kind of object, so instead of using the object type, we only note whether the object is a router or a network, since this property stays the same even for changed objects. The code is also oblivious to the fact that the interface ID is a number and what it means -- we use it as an opaque \uv{discriminator} and do not even bother with parsing it from a string. The VertexIDs are supposed to be descriptors of objective vertex state, so they do not belong to any particular TopologyV3. Instead, they can be used to track actual Vertices across multiple Topologies. Apart from VertexIDs, the TopologyV3 also consists of the additional data in Vertex objects and Edges. The Vertex objects, as noted above, contain only a set of incoming and outgoing edges, references to their TopologyV3 and VertexID objects and the actual type of the object the vertex represents (i.e. the first column of the table). An Edge knows the source and target VertexID, its cost and the number of parallel edges with these properties. If the Edge was determined by a virtual link, it is marked as virtual. This is needed, because the both Vertices are regular routers, so the information about the virtual link cannot be stored in them. Note that an Edge does not need to belong to any Topology, since it only contains factual data. The information, whether an Edge is in the topology, is stored only in the incident Vertices. A Topology can be marked as \uv{frozen}. This denotes an intent that it really should not be modified, because other code might rely on the particular shape of the Topology. However, making the Topology trully immutable would be impractical in Python, so we opted for this approach. In case our solution turns out to be prone to accidental modification of the Topology, we will deploy additional countermeasures against that. Frozen Topologies also allow us to stack them, creating a union of the original Topologies. This way, a single Topology can be used in the visualisation, while keeping the original information. This mechanism is fully generic, but was mainly invented to allow merging the reference (expected) topology with the actual one (i.e. the current state of the system). The ancestors are stored by a string label in a dictionary of the Topology. While subclassing TopologyV3 into a StackedTopology would probably be a cleaner design, since the only difference is a state of one dictionary, we did not employ this approach. The TopologyProviders are not very interesting, but are important nevertheless. There are a few caveats with parsing topologies from the ospffile format. First, the edges from routers to networks can only be resolved after the networks are known, since network's level-2 block contains information not present in the level-3 directive for the router (namely, the designated router for OSPFv2 natworks and the set of addresses for OSPFv3). Since BIRD may be running more than one instance of OSPF, the BirdSocketTopologyProvider contains an ad-hoc parser of the response to the \texttt{show protocols} command, which seems to be a reliable way to list running instances. Moreover, BIRD does not seem to expose any way to determine the version of OSPF. So far, we think it is sufficient to guess from the \texttt{network} directives, since they seem to contain a hyphen if and only if the dump is from an OSPFv3 instance. (The source code of BIRD suggests that under some circumstances, brackets can appear even in OSPFv2 dump, so that is not a possibility.) \section{Annotations} Once a TopologyV3 is obtained, it may be annotated. An Annotator may create an Annotation, which is then stored as a part of an AnnotatedTopology. We now explore design of these objects in detail. An Annotation is essentially only a holder for any \uv{tags} that are to be attached to the topology. These are represented by a dictionary holding annotations for Vertices, another dictionary for annotating Edges, and a single field allowing the attachment of a tag to the entire Topology. The keys of the dictionaries are VertexIDs and Edges, respectively. The Annotation can only attach one tag to each vertex and edge, but there are little restrictions of what the tag is allowed to be. The intention is to allow Annotators to provide any useful data they can collect. However, we think that our AnnotatedTopologies could be utilised in other projects, so the Annotation objects ought to be easy to serialise into JSON or other formats. Annotations do not need to take other Annotations into account, because AnnotatedTopology stores Annotations from different Annotators separately. The Annotators are a tiny bit more interesting. While these objects are basically a wrapper around the \verb|annotate()| method, which takes an AnnotatedTopology and returns an Annotation, there are few twists to it. First, an Annotator object is intended to be created by the respective AnnotatedTopology in order for it to keep track of all the Annotators. To describe an Annotator, an AnnotatorID is used, which is a re-creatable and hashable recipe for creating that Annotator. It is also used as a handle to reference and scope the resulting Annotation. The AnnotatorID is a pair of the type object of the particular Annotator, and an optional hashable parameter, which is passed to the Annotator's initialiser. Second, an Annotator might require another Annotator to have already run. We make this possible by allowing Annotators to request another Annotator to be run by the AnnotatedTopology (provided the AnnotatorID), as long as there is not a dependency cycle. This is the recommended method of implementing dependencies of Annotators. Furthermore, an Annotator can be declared to be idempotent. This affects what happens when the same Annotator is invoked on the same Topology in the same way (that is, using the same AnnotatorID) multiple times. For idempotent Annotators, we know that their output will not change, so the Annotator is not really run. For non-idempotent Annotators, the previous Annotation is removed and the Annotator is run again. Annotators may not alter the AnnotatedTopology in any way. They are only allowed to return an Annotation, which will be added to the AnnotatedTopology. As with frozen Topologies, this is not enforced by the code. Annotators may be used for various tasks, including but not limited to performing analysis of the Topology, enhancing it with additional data (e.g. ping response times from other system), or specifying parameters for visualisation. As a part of Birdvisu itself, we ship several anotators: TopologyDifference outputs the differences between the reference and current Topology, and ShortestPathTree marks the edges of the shortest path DAG. The last important object related to annotation is the AnnotatedTopology. It serves as a coordinator for running Annotators. It does two main things: detects dependency cycles between Annotators, and keeps the Annotations. The Annotations in the AnnotatedTopology are stored in a dictionary indexed by the respective AnnotatorID. For vertices and edges, only sets of AnnotatorIDs are stored. This way, both iterating Annotations for a Vertex or Edge and examining individual Annotations is fast. Also, our approach isolates unrelated Annotations by putting them into different scopes by AnnotatorID. However, by using the Annotator's type in AnnotatorID, this design enforces a rather tight coupling between Annotators and users of Annotations, because the consumers of Annotations need to understand the precise format of the particular Annotation. This could be solved by implementing support for \uv{interface-annotators}, so that various Annotators may provide Annotations in a commonly understood format.\footnote{Preliminary work on implementing this approach is present in the \texttt{ann\_interfaces} branch, but the interaction of implementers of the same interface is not decided yet.} AnnotatedTopology does not expose a way to delete old Annotations. While we do not expect this to cause big memory leaks, in case it does, an LRU-like strategy might be employed to tame the memory usage. Also, the Annotators could be run dynamically when the Annotation is requested, but our current approach does not need this functionality, so it is not implemented at the moment. \section{Visualisation} \XXX{Layouting (nonexistent), why not graphviz, why not consensual metrics, how we are re-using annotations internally. Saving layouts}