head 1.3; access; symbols; locks; strict; comment @# @; 1.3 date 2014.03.06.05.24.08; author dholland; state Exp; branches; next 1.2; commitid 90Sbf6gnlCCG9Brx; 1.2 date 2013.05.24.08.25.11; author wiz; state Exp; branches; next 1.1; commitid uO830S8w88aQlRQw; 1.1 date 2013.05.24.00.41.31; author dholland; state Exp; branches; next ; commitid 1ggxzJuBByZCMOQw; desc @@ 1.3 log @a couple minor adjustments, sitting around since last july @ text @The material herein is grouped first by topic and then by priority. ------------------------------------------------------------ 1. Operational model - Centralized operation with one master tree - Supports disconnected operation - No compare-by-hash - Native support for synced slave copies of the master tree (like anoncvs) - Transport-independent remote operation, supporting both http/https and ssh - Checkouts can cache arbitrary amounts of history locally but are not obliged to clone everything - Non-committers with readonly checkouts should be able to package changesets for review and commit by committers. Rationale: Centralized operation is posed as a design requirement because it's a prerequisite for other things... and because this whole project is predicated on the assumption that centralized operation is acceptable. If someone comes up with a clever way to support distributed operation without compromising other requirements, well and good; otherwise one may as well use one of the modern distributed version control systems. Disconnected operation, meanwhile, covers most of the use cases people cite in favor of distributed version control. Compare-by-hash is bad not because it's slightly sleazy, or because the statistical assumptions about the probability of collisions are wrong (although in some contexts they're questionable) -- it's because cryptographic hash functions don't age well and the standard DVCS scheme for hashing chains of versions doesn't provide any decent way to migrate an existing repository to a new hash function. Native support for synced slave copies is needed in order to be able to provide anonymous access (like anoncvs) without needing access to the master tree. This is also meant to satisfy the use cases where currently people rsync the whole CVS repository locally. Transport-independent remote operation should be a no-brainer, but even many recent systems have felt the need to make up their own protocols and network-level constructs. Easier collaboration with non-committers is an often-requested feature and a de facto property of distributed version control. Unanswered questions: - Do we need support for disconnected operation by more than one user at a time (or perhaps more than one tree at a time) so that uncommitted changesets can be shared? The non-committer changeset support might cover this territory adequately, or not, depending on how it ends up working. 2. Schema - Supports arbitrary (smallish) metadata attached to changesets, and also to files and directories - Metadata (including on old versions) is mutable and changes are kept in history (this includes commit message text) - Provides provenance tracking for changesets/commits - Commits/changesets are atomic - Version numbers (for projects, files, and subtrees if any) are sequential. - Supports rename (of files or dirs) properly, and file history crosses renames transparently - Supports copy/duplicate (of files or dirs) properly - Has a coherent semantic model of tree history - Supports local-only changes that are not pushed back to the master tree Rationale: While arbitrary metadata is a nuisance to support (compared to a small fixed metadata schema) and in many cases using this metadata facility (as opposed to storing information in an ordinary file in the repository) would be a mistake, it is nonetheless useful for various purposes. One of these is preserving old version numbers from a repository conversion; given the large number of references to NetBSD CVS file version numbers, including in places like security advisories that count as "important", preserving this information and making it searchable is highly desirable. Metadata should be mutable because sometimes it contains errors. One of the big weaknesses of current distributed version control is that effectively all metadata is immutable once committed; this means any botch not immediately detected is graven in stone for all time, unless someone does a complete repository rebuild updating all subsequent versions. Meanwhile, keeping the history should be a no-brainer. Provenance tracking for commits is important for two reasons: maintaining proper credit/attribution (which can involve legalities via copyright as well as propriety) and also making sure that bogus changesets cannot be introduced. In distributed version control systems this becomes complicated and either requires an elaborate solution (e.g. in monotone) or giving up on the problem entirely (e.g. in mercurial). For a centralized system it is much easier but still important, especially given tools for applying changesets that originate from non-developers. Changesets need to be atomic. Non-atomic changesets is a stupid design flaw of CVS that we should certainly not perpetuate. Version numbers need to be sequential so it's possible to tell easily if a version you have contains a particular change or fix... and in particular, tell easily without having to cut and paste a hash code and go ask the version control system. You should be able to tell at a glance from running ident on a binary whether it needs to be replaced with a fixed version or not. CVS doesn't support rename. We desperately need rename support because large sections of the NetBSD source tree are in serious need of organizational cleanup. File history should cross renames because if you're looking at the history of a particular file, you shouldn't have to stop and go search something else just because someone moved the file around. This sounds like a no-brainer but a lot of "modern" version control systems don't really get it right. Note that rename is not semantically equivalent to copy and delete. Likewise, duplicating a file is an action that should be explicitly recorded; the support for sideways change propagation (below) requires this. A coherent semantic model of tree history is required in order to do merges of changesets that reorganize the tree. Many "modern" version control systems don't really get this right. Support for local-only changes is highly desirable if you're carrying local modifications; it's effectively the same as keeping private changes as uncommitted modifications in your working tree, except with more structure, proper history, and a way to explicitly make sure the changes don't get committed by accident. 3. Branches and branch management - Supports lightweight branches / multiple heads - Supports full/named branches - Supports something like hg bookmarks to keep git users happy - Distinguishes branches intended to diverge from those intended to be folded back in later - Allows enforcing a graph of branch relationships - Keeps track of which changesets from parallel branches have been pulled in/merged across (including instances of separate but equivalent changes) - Also keeps track of which changesets from parallel branches have been considered and rejected - Supports this same form of sideways change propagation for files that have been duplicated - Supports hyper-branches (preferably) - Supports local-only branches that are not pushed back to the master tree - Allows accmumulating small local changes into a single upstream commit that neither loses the individual change history nor forces other users to wade through it except by choice - Maybe, support for local patch queues Rationale: Lightweight branches (that is, if you commit a change based on an older version you just get another head) are necessary for disconnected operation. These occur and get merged on short timescales as a routine matter during development. "Real" branches (branches with names that have metadata and tracking information and so on) are also required, for releases and for development of major features and so forth. Mercurial was forced to add "bookmarks" to keep git users happy; a lot of git users apparently don't understand anything besides git's insane branch semantics and aren't interested in learning or understanding what they're doing. We will need something like this too, in all probability (and it's a useful feature) so it may as well get designed in up front. Branches that are intended to diverge (releases, for example, or outright project forks) are fundamentally different from branches that are expected to reconnect to their parent (e.g. feature development branches) once the version control system has any kind of branch management or tracking support. If you have a lot of branches that are supposed to exist with certain relationships to one another, it's fairly easy to accidentally break this structure by merging with the wrong other branch; and if you do, backing out of the resulting mess can be quite a nuisance. Therefore, it should be possible to declare the intended structure and have the system reject accidental attempts to violate that structure. (Note: NetBSD may not need this. dholland specifically wants it and will put in the work to get it.) No existing version control system keeps track of which changesets from branch A have and have not been pulled in to branch B, or is capable of listing the ones that haven't been considered yet for possible action. There is absolutely no reason, however, that the version control system shouldn't be able to provide this information. AIUI, for release branches releng currently has to maintain this metadata by hand. (Update: "no existing ..." may actually be "no existing free ...".) If you duplicate a file, such as cloning a device driver template file for a new driver, or starting a new pmap by copying an old one, usually bug fixes applied to the original version should also be propagated to the clone. The same kind of changeset tracking just described for branches should be available for duplicated files, to make sure this gets done and to allow easily keeping track of where it has and hasn't been done. By "hyper-branches", I mean a branch of the entire repository state, including branches. (I have a vague recollection that somebody else may be using the term "hyper-branches" for something else, in which case we need new terminology.) This is, for example, something you might want if you have two parallel versions of a project (e.g. a free and pay version) and maintain those as branches, but then also want to be able to take release branches of both at once. I have no idea at the moment if there's a use case for hyper-hyper-branches (that is, branches of hyper-branches) or not. (Note: NetBSD does not need this. dholland specifically wants it for another project and is willing to put in a good deal of work to get it.) Local-only branches have the same rationale as local-only changesets. Merging cumulative local commits into a single upstream commit makes it possible to commit very early and very often (which is very useful if you ever need to bisect later) without deluging other developers on the project with a flood of tiny commits they don't care about in detail. However, because you want to maintain the individual changes in the master repository (to support that bisecting) but don't want to show them by default, there needs to be explicit support for dividing changesets into subchangesets and an explicit way to expand them when viewing history. No existing system can do this; many can do something similar, but in all cases I know of this either throws away the fine-grained history or makes everybody wade through it afterwards. Local patch queues (like mq in mercurial) are a useful way of maintaining private changes and/or preparing batch commits. It is probable that most of the use cases are subsumed by other features (local-only commits, cumulative commits, etc.) and we don't also need patch queues. Given the branch graph structure feature described above, even the use case of preparing patchkits for third-party trees may be better done with branches, although it might be worthwhile to arrange a way to do branch push/pop in a way akin to patch push/pop. 3. Implementation - Written in C - Doesn't depend on anything other than standard system libs - Decently fast - Scales to large trees with deep history - Supports inotify/kqueue/whatnot for monitoring large checkout trees - Install doesn't spew tons of crap all over everywhere - Has an interface for plugins and/or extensions Rationale: Writing in C (or perhaps C++ but C++ is not really a sane choice of language) with no major deps is a requirement for importing into base, where the tool used to manage the NetBSD source tree should be found. Being decently fast is necessary to avoid driving users crazy. Scaling is necessary for use on/in NetBSD. The major performance bottleneck for most systems on large trees is scanning the tree for files that have been modified. This inevitably takes as long as doing find . -ls, and on a tree the size of NetBSD's source tree that takes a while even when the whole tree fits in RAM. Many recent tools have a gizmo that starts a daemon using inotify or similar to monitor the working tree in the background; then the explicit search can be avoided and things become much faster. A tidy install is desirable for a number of reasons (integration into base being one of them) and should not be a major problem. We want some kind of plugin/extension interface because, at a minimum, there are probably some graphic tools that should be available and they can't be part of the base install of either this program or NetBSD. 4. User interface - Clean, small command set - No weird semantics - Search support for metadata (including/also change messages) as well as searching file contents Rationale: All of this is pretty much obvious. By "no weird semantics" I mean anything from oddities like mercurial's tags to core design mistakes like git's branches... or things in between like subversion's branches; anything that violates the principle of least surprise or that requires lengthy explanation/justification for why it doesn't behave the way a reasonable person would expect. 5. Miscellaneous other features - Can remove/obsolete/blacklist unwanted changesets - Supports splicing of equivalent but technically unrelated versions - Can stash local changes temporarily - Can check out subtrees - Can explicitly revert files or whole subtrees in a checked-out tree to earlier versions - Supports configurable keyword expansion Rationale: Blacklisting or otherwise getting rid of unwanted changesets is a non-negotiable requirement for legal reasons. We want splicing so we can, at some point in the future and if we so desire, pull in the CSRG version history and connect it up with our own. Stashing local changes is necessary if you can't have uncommitted local changes while merging, and it really doesn't make sense to allow that. People do gripe, but the best way is to stash your changes, merge, and unstash them. Otherwise if you get a merge conflict in a file you've also got local changes to, it becomes an awful mess. (Note that in comparable situations CVS makes you check out a whole new tree...) Checking out subtrees is widely desired for working on single programs or (in particular) checking out only the kernel. Reverting portions of the tree locally is often necessary for one reason or another in practice; lack of adequate support for this in most of the "modern" version control systems has been and remains a barrier to adoption in/for NetBSD. We still need to be able to run ident on binaries and get useful information out. Keyword expansion is not the only way to accomplish this; but it's easier to deploy and use than any of the alternatives. A reasonable implementation should not suffer from the persistent aggravations that CVS keywords often cause. (All expansions need to be invertible; all actions, particularly diffs and merges, should be always done using the unexpanded form.) @ 1.2 log @typo. @ text @d181 1 a181 1 are expected to reconnect to their parent (b.g. feature development d200 2 a201 1 metadata by hand. @ 1.1 log @Stuff distilled from my notes and previous arguments and bikeshed sessions @ text @d47 1 a47 1 feature and a de facto property of distributed version contorl. @