\chapter{Design}\label{ch:design} We now explain the design of Birdvisu in depth. First, we explain some important decisions and present the overall structure of the project, then we look into individual parts of the program. Birdvisu is implemented in Python, using PySide6, the official bindings for Qt6, for drawing on screen. We decided to use Qt, because it provides a lot of pre-made widgets and tools and since it is widely used, it is easy to find help for it on the Internet. The decision to use Python was not hard either. Not only Qt has official bindings for it, but we use the language very often and thus are comfortable writing in it. We do not expect the potential slowness of Python to be an issue, because for handling graphics we are using Qt, which is written in C++. Also, as we have analysed in section~\ref{s:areas}, we expect the topologies to be quite small. The project comprises of three main parts: data collection, annotation and presentation part. The data collection part is tasked with finding out the current topology and creating a usable representation of such topologies and their combinations. In the annotation part, we add additional information to the topologies like the difference from the expectation or graph properties of the topology. Finally, when we have all the needed information, we draw the topology on the screen, highlighting the relevant information. \section{Recurring and general patterns} Birdvisu's data structures make heavy use of dictionaries and sets, because we do not handle much data that would need to be processed in any particular order. While this allows us to perform set operations quickly, it requires us to provide hashable keys. We have decided to embrace this requirement and use rather complex frozen dataclasses, which can hold as much of the required data as possible, as long as we can re-create that data. This can be illustrated on our usage of vertices in topology. There are two objects: a VertexID, and the Vertex itself. VertexID is the hashable part and Vertex provides additional information, like incident edges, which are not hashable. The topology then has a dictionary from the VertexIDs to Vertices, providing complete data. However, the VertexID already contains information like what version of IP it belongs in, whether it represents a router and all the possible IP addresses and identifiers related to the vertex. It is sufficient for Vertex objects to only contain sets of edges and references to the related topology and VertexID. (In the next section, we will see that a type of the vertex is also stored in Vertex, but that is really everything.) The other thing we decided to reuse, was the format of BIRD's topology output. We call the format \uv{ospffile} and extended it by allowing comments (after an octothorpe, i.e. \verb|#|). Also, empty lines do not seem to be of relevance. These are quality-of-life improvements for cases when ospffiles are edited by hand. Apart from storing topologies, we intend to use ospffiles for description of basic styles. Therefore, our implementation in \verb|birdvisu.ospffile| only constructs the tree of strings and does not try to understand it. Our module provides API similar to the one of \verb|json| or \verb|marshall| module, even though it cannot represent arbitrary types. \section{Data collection: providers and parsing} This part of the project deals with processing topologies. The core object of this part is a TopologyV3\footnote{The \uv{V3} suffix is sometimes impractical to keep, so we will sometimes shorten the class name only to \uv{Topology}. It denotes the same object.}. While the Topologies can be created manually by adding the vertices and edges, we expect that retrieving topologies from other sources like saved ospffiles or running BIRD processes. This is made possible by implementing a TopologyProvider. Representing a topology turns out to be a bit complicated problem for the following reasons: \begin{itemize} \item The topology edges need to be directed. OSPF allows a shortest path from A to B to be different to the other direction. \item It can have a very general shape, so we cannot rely on common patterns. For example, routers can be connected to other routers using point-to-point or virtual links, not just networks. \item The objects are shape-shifting. A transit network may become stub or change the designated router and we want to be able to understand the change as best as possible. \item The topology is not necessarily a graph, because multiple links may lead from a single router to the same network. However, we strongly believe that the maximum number of parallel edges is quite low, so most of the theory for simple graphs is still applicable. \item For completeness, we note here again that the shortest paths from a single vertex form a DAG, even though the OSPF specifications speak of them as of trees. (Negative edges are, fortunately, not permitted.) \end{itemize} Given the above requirements and lessons learned in section~\ref{s:net-unusual}, we need to find a representation of vertices, that is both powerful enough to uniquely describe a particular vertex, and flexible to allow us easily detect its metamorphoses. The table~\ref{tab:vertexid} shows, which information we can use for each type of object. We see that networks in particular are hard to represent, because the ID of the DR may change and it might be the only distinguishing property in case of a split network. \bgroup \def\yes{\checkmark} \def\chg{$\bullet$} \begin{table}[h] \centering \begin{tabular}{cccccc}\hline Object & Address & RID & DR ID & IF ID & Notes \\\hline \verb|router| & -- & \yes & -- & -- &\\ \verb|xrouter| & -- & \yes & -- & -- &\\ \verb|vlink| & -- & \yes & -- & -- & Peer is a \verb|router|\\ \verb|network| & v2:\yes,v3:$*$ & -- & \chg & v3:\chg &\\ \verb|external| & \yes & -- & -- & -- &\\ \verb|xnetwork| & \yes & -- & -- & -- &\\ \verb|stubnet| & \yes & \yes & -- & -- &\\ \end{tabular} \caption{Information determining each object of a topology. $*$ means it may or may not be known, \chg\ denotes an attribute that may change. Columns in order: whether it has assigned a address, relevant router ID, ID of designated router, interface number of the DR.} \label{tab:vertexid} \end{table} \egroup We decided to aim for correctness, so whenever any of the attributes of an object change, we consider it to be a a separate object. This may create some false positives, but we think that is the better case than potential false negatives, which could hide some issues. Also, when the infrastructure works correctly, the designated router should only change in case of outage. Therefore, it might actually be useful to notice the user when a network has an unexpected designated router even when it is otherwise healthy. However, we provide a way to find objects by partial information, using the VertexFinder objects, so this allows heuristics to match different objects. The information mentioned in table~\ref{tab:vertexid} serves as the main part of the VertexID. However, we want the VertexID to identify the same object even after it transforms to another kind of object, so instead of using the object type, we only note whether the object is a router or a network, since this property stays the same even for changed objects. The code is also oblivious to the fact that the interface ID is a number and what it means -- we use it as an opaque \uv{discriminator} and do not even bother with parsing it from a string. The VertexIDs are supposed to be descriptors of objective vertex state, so they do not belong to any particular TopologyV3. Instead, they can be used to track actual Vertices across multiple Topologies. Apart from VertexIDs, the TopologyV3 also consists of the additional data in Vertex objects and Edges. The Vertex objects, as noted above, contain only a set of incoming and outgoing edges, references to their TopologyV3 and VertexID objects and the actual type of the object the vertex represents (i.e. the first column of the table). An Edge knows the source and target VertexID, its cost and the number of parallel edges with these properties. If the Edge was determined by a virtual link, it is marked as virtual. This is needed, because the both Vertices are regular routers, so the information about the virtual link cannot be stored in them. Note that an Edge does not need to belong to any Topology, since it only contains factual data. The information, whether an Edge is in the topology, is stored only in the incident Vertices. A Topology can be marked as \uv{frozen}. This denotes an intent that it really should not be modified, because other code might rely on the particular shape of the Topology. However, making the Topology trully immutable would be impractical in Python, so we opted for this approach. In case our solution turns out to be prone to accidental modification of the Topology, we will deploy additional countermeasures against that. Frozen Topologies also allow us to stack them, creating a union of the original Topologies. This way, a single Topology can be used in the visualisation, while keeping the original information. This mechanism is fully generic, but was mainly invented to allow merging the reference (expected) topology with the actual one (i.e. the current state of the system). The ancestors are stored by a string label in a dictionary of the Topology. While subclassing TopologyV3 into a StackedTopology would probably be a cleaner design, since the only difference is a state of one dictionary, we did not employ this approach. The TopologyProviders are not very interesting, but are important nevertheless. There are a few caveats with parsing topologies from the ospffile format. First, the edges from routers to networks can only be resolved after the networks are known, since network's level-2 block contains information not present in the level-3 directive for the router (namely, the designated router for OSPFv2 natworks and the set of addresses for OSPFv3). Since BIRD may be running more than one instance of OSPF, the BirdSocketTopologyProvider contains an ad-hoc parser of the response to the \texttt{show protocols} command, which seems to be a reliable way to list running instances. Moreover, BIRD does not seem to expose any way to determine the version of OSPF. So far, we think it is sufficient to guess from the \texttt{network} directives, since they seem to contain a hyphen if and only if the dump is from an OSPFv3 instance. (The source code of BIRD suggests that under some circumstances, brackets can appear even in OSPFv2 dump, so that is not a possibility.) \section{Annotations} Once a TopologyV3 is obtained, it may be annotated. An Annotator may create an Annotation, which is then stored as a part of an AnnotatedTopology. We now explore design of these objects in detail. An Annotation is essentially only a holder for any \uv{tags} that are to be attached to the topology. These are represented by a dictionary holding annotations for Vertices, another dictionary for annotating Edges, and a single field allowing the attachment of a tag to the entire Topology. The keys of the dictionaries are VertexIDs and Edges, respectively. The Annotation can only attach one tag to each vertex and edge, but there are little restrictions of what the tag is allowed to be. The intention is to allow Annotators to provide any useful data they can collect. However, we think that our AnnotatedTopologies could be utilised in other projects, so the Annotation objects ought to be easy to serialise into JSON or other formats. Annotations do not need to take other Annotations into account, because AnnotatedTopology stores Annotations from different Annotators separately. The Annotators are a tiny bit more interesting. While these objects are basically a wrapper around the \verb|annotate()| method, which takes an AnnotatedTopology and returns an Annotation, there are few twists to it. First, an Annotator object is intended to be created by the respective AnnotatedTopology in order for it to keep track of all the Annotators. To describe an Annotator, an AnnotatorID is used, which is a re-creatable and hashable recipe for creating that Annotator. It is also used as a handle to reference and scope the resulting Annotation. The AnnotatorID is a pair of the type object of the particular Annotator, and an optional hashable parameter, which is passed to the Annotator's initialiser. Second, an Annotator might require another Annotator to have already run. We make this possible by allowing Annotators to request another Annotator to be run by the AnnotatedTopology (provided the AnnotatorID), as long as there is not a dependency cycle. This is the recommended method of implementing dependencies of Annotators. Furthermore, an Annotator can be declared to be idempotent. This affects what happens when the same Annotator is invoked on the same Topology in the same way (that is, using the same AnnotatorID) multiple times. For idempotent Annotators, we know that their output will not change, so the Annotator is not really run. For non-idempotent Annotators, the previous Annotation is removed and the Annotator is run again. Annotators may not alter the AnnotatedTopology in any way. They are only allowed to return an Annotation, which will be added to the AnnotatedTopology. As with frozen Topologies, this is not enforced by the code. Annotators may be used for various tasks, including but not limited to performing analysis of the Topology, enhancing it with additional data (e.g. ping response times from other system), or specifying parameters for visualisation. As a part of Birdvisu itself, we ship several anotators: TopologyDifference outputs the differences between the reference and current Topology, and ShortestPathTree marks the edges of the shortest path DAG. The next section describes how Annotators aid visualising the data. The last important object related to annotation is the AnnotatedTopology. It serves as a coordinator for running Annotators. It does two main things: detects dependency cycles between Annotators, and keeps the Annotations. The Annotations in the AnnotatedTopology are stored in a dictionary indexed by the respective AnnotatorID. For vertices and edges, only sets of AnnotatorIDs are stored. This way, both iterating Annotations for a Vertex or Edge and examining individual Annotations is fast. Also, our approach isolates unrelated Annotations by putting them into different scopes by AnnotatorID. However, by using the Annotator's type in AnnotatorID, this design enforces a rather tight coupling between Annotators and users of Annotations, because the consumers of Annotations need to understand the precise format of the particular Annotation. This could be solved by implementing support for \uv{interface-annotators}, so that various Annotators may provide Annotations in a commonly understood format.\footnote{Preliminary work on implementing this approach is present in the \texttt{ann\_interfaces} branch, but the interaction of implementers of the same interface is not decided yet.} AnnotatedTopology does not expose a way to delete old Annotations. While we do not expect this to cause big memory leaks, in case it does, an LRU-like strategy might be employed to tame the memory usage. Also, the Annotators could be run dynamically when the Annotation is requested, but our current approach does not need this functionality, so it is not implemented at the moment. \section{Visualisation} The visualisation is split into two parts: computing the appearance and actually showing the result. For the former we reuse the Annotator infrastructure. The latter is handled by Qt's Graphics view framework. The appearance is described by a styling dictionary. For vertices, it contains a position and a highlighting colour. Edges can have a colour, line width and a highlighting colour. However, more styling properties can be defined in the future. To provide those styling dictionaries, a subclass of Annotators is created, StyleAnnotator. StyleAnnotators only differ from regular Annotators in that they only tag vertices and edges with styling dictionaries. This provides something similar to an interface, helping to uncouple the style from the specific Annotator that provided the respective data. Each Annotator which provides data worth showing has a companion StyleAnnotator to provide the respective style. When drawing, we pick one StyleAnnotator and highlight data according to it. The current approach avoids mixing styles from multiple Annotators, which might be required for more advanced use in the future. We considered using stylesheets similar to CSS, but we think that approach is too heavy-weight. Rather, assigning priorities to the StyleAnnotators could allow a flexible order of applying styles, but at this point this also seems like a unnecessary complication of the project. We let the user decide, where the vertices should be placed, because they might have some idea or understanding of the system that is not present in the topology. For this reason, we also ignore classical metrics of graph drawing, like the crossing number of the layout. This can be demonstrated on the default Gennet topology: while it forms a planar graph, it makes more sense to let the edges cross as on figure~\ref{fig:gennet}, because the layered structure is more important. To store the placement, we reuse the ospffile format. An example is shown in listing~\ref{lst:visufile}. The top-level contains a \verb|visualisation| directive, so that other information may be stored in the same file in the future. Level-2 contains vertex specification in the same format as in dumps from BIRD. On level-3 there is a \verb|position| directive with the coordinates, but for transit networks, additional details (DR or address) can be provided to specify the correct network. Similarly, we allow a \verb|router| level-3 directive to be used in the \verb|stubnet| block. This format allows using BIRD's output as the basis for the placement file and could be extended by other directives if needed in the future. \begin{lstlisting}[float=h,label=lst:visufile,caption=Vertex placement description] visualisation router 192.0.2.14 position 200 200 network 192.0.2.0/28 position 0 1500 dr 192.0.2.14 \end{lstlisting} We try to place vertices without known position in proximity to already placed neighbours, so that the user can easily locate them. Since the neighbours can also have unknown position, BFS is used: we place the vertices of known positions, then their neighbours in their proximity, then the neighbours' neighbours and so on. When there is a completely unplaced component, we place one of its vertices at random. However, disconnected topologies are of little interest to us. We tried using Graphviz\cite{graphviz} for laying out the vertices, but we were not satisfied with its result. To demonstrate, the listing~\ref{lst:graphviz} describes the topology of our home network with Gennet attached. Figure~\ref{fig:graphviz} then shows how each of Graphviz's layout engines draws the topology. While it could be possible to tweak the engine setings, we believe the user still knows better, so we did not continue exploring this idea. \lstinputlisting[float=h,label=lst:graphviz,caption=Author's home topology]{../img/graphviz-fail/source.dot} \begin{figure}[h] \centering \begin{subfigure}[b]{0.8\textwidth} \centering \includegraphics[width=\textwidth]{../img/graphviz-fail/dot.pdf} \caption*{dot} \end{subfigure} \begin{subfigure}[b]{0.8\textwidth} \centering \includegraphics[width=\textwidth]{../img/graphviz-fail/circo.pdf} \caption*{circo} \end{subfigure} \begin{subfigure}[b]{0.45\textwidth} \centering \includegraphics[height=\textwidth,width=9cm,keepaspectratio,angle=90]{../img/graphviz-fail/sfdp.pdf} \caption*{sfdp (rotated for clarity)} \end{subfigure} \begin{subfigure}[b]{0.45\textwidth} \centering \includegraphics[width=\textwidth]{../img/graphviz-fail/neato.pdf} \caption*{neato} \end{subfigure} \begin{subfigure}[b]{0.45\textwidth} \centering \includegraphics[width=\textwidth]{../img/graphviz-fail/fdp.pdf} \caption*{fdp} \end{subfigure} \begin{subfigure}[b]{0.45\textwidth} \centering \includegraphics[width=\textwidth]{../img/graphviz-fail/twopi.pdf} \caption*{twopi} \end{subfigure} \caption{The unpleasant results of Graphviz's layout engines} \label{fig:graphviz} \end{figure} In order to display the topology, we convert it to a simple undirected graph. The set of vertices stays the same, the edges are only reduced to a pair of VertexID without having any kind of cost associated with it. The only tricky part is deciding the style of the new edge when the StyleAnnotator returned multiple styles for the unified edge. We decided to pick the style of the lightest positive-cost edge, since it is the most relevant in most cases: zero-cost edges are almost always network-to-router edges\footnote{We are not sure whether other zero-cost edges are permitted.} and heavier edges are not used, because it is always cheaper to use the light edge. (We are aware that this is not true for asymmetrically configured point-to-point links, but we do not think they are commonly deployed.) Since we will be able to use edges in sets, we need a canonical hashable representation. For that, we implement a total ordering on VertexIDs, which allows us to use pairs of the VertexIDs in ascending order to reference the edge. There are currently no specific requirements for the ordering to satisfy. The vertices and edges of the simple graph have a one-to-one mapping to the actual graphics items shown to the user. The Graphics view framework allows us to create nested items, which we use for highlighting: a highlight is just a bigger rectangle in a lower layer than the displayed object. Moreover, the framework has built-in support for dragging and right-clicking objects, which simplifies creating the UI. Currently, the MainWindow object acts as a coordinator of everything that needs to happen, from loading topologies and annotating them to putting them on the scene and allowing the user to interact with them.