1.1
CONTENTS
DEFINITIONS
Unless otherwise noted, all definitions are taken from (or are paraphrased by
me based on the definition in):
- Tanenbaum, Andrew S., and Maarten Van Steen, Distributed Systems:
Principles and Paradigms, Second Edition. Upper Saddle River, NJ:
Pearson Prentice Hall, 2007. ISBN: 0-13-239227-5
DEFINITIONS -- Chapter 1: Introduction
DEFINITIONS -- Chapter 2: Architectures
- Software Architectures: describe how software components should be
organized, and how they should interact.
- System Architecture: the combination of software components with the
actual machines on which they are hosted.
- Autonomic Systems: A system that monitors and can adjust its behavior
- Architectural Styles:
- Layered: each layer can call the layer below, and be called by the layer
above
- Object-Based: Objects call each other via RPC
- Data-Centered: Processes communicate via data services.
- Event-Based: communication propagation of events (publish / subscribe).
- Referential Decoupling: Message producers and consumers need not know
about each other, such as in a publish/subscribe system where a central
repository manages messages.
- Shared Data Spaces: Communicating processes use the same data space;
they need not both be available when messages are sent.
- Server: a process implementing a specific service
- Client: a process that requests a service from a server
- Servant: can act as a client and a server at the same time.
- Idempotent: property of an operation in which it can be repeated multiple
times without causing harm.
- Typical Application Layers: User Interface, Processing, Data
- Data Persistence: Data is stored and available, even when no instance of
the application is running.
- Two-Tiered Architecture: Multiple Clients, One Server.
- Fat Clients: More functionality on the client computer
- Thin Clients: Less software on client computer
- Three-Tiered Architecture: Typically UI, application server,
database server.
- Component: a modular unit with well-defined interfaces that is replaceable
within its environment.
- Vertical Distribution: division of applications into tiers corresponding
to logical organization.
- Vertical Fragmentation: in a relational database, splitting tables
column-wise and distributing across multiple machines.
- Horizontal Distribution: distributing load-balancing servers,
providing the same functionality, across multiple hosts.
- Peer-to-Peer Systems support horizontal distribution.
- Overlay Network: network in which the nodes are the processes, and links
are the communication channels.
- Distributed Hash Table (DHT): data items are assigned a random unique
identifier, creating a map of keys to their associated resources around the
network. DHT is then used by routing algorithms.
- Membership Management: methods by which nodes can organize themselves into
an overlay network.
- Content-Addressable Network (CAN): Two-dimensional DHT system.
- Unstructured Peer-to-Peer Network: Each node maintains a list of
neighbors only.
- Random Graph: group of neighbors is a random subset of nodes
- Partial View: each node sees only part of the graph
- Semantic Proximity: Improve search efficiency by tracking how closely
neighboring data items are related.
- Semantic Overlay Networks: based on semantic proximity results; can be
used for highly efficient searching.
- Content-Delivery Network (CDN): Various nodes hold copies of web pages, to
allow clients to retrieve page from a nearby node.
- Superpeers: Peers on P2P network holding an index or acting as a broker.
Local peers may have relationship with nearby superpeer.
- Edge-Server Systems: form the boundary nodes between internet and external
nodes, such as Internet Service Providers (ISP).
- Tracker: In BitTorrent, a tracker server records active nodes holding
parts of the requested file.
- Broker: responsible for registering servers, and making them known to
others
- Interceptor: breaks the control flow, and allows other
application-specific code to execute.
- Request-Level Interceptor: interceptor at higher level; knows about
replicated services, for example.
- Message-Level Interceptor: local operating systems may fragment
messages for better performance or reliability
- Aspect-Oriented Software Development: concerned with "cross-cutting"
concerns like security, fault-tolerance, and other concerns that relate to all
tiers of the software.
- Self-Star Systems / Autonomic Computing: self-managing, self-healing,
self-configuring, self-optimizing, etc.
- Feedback-Control Loops: part of larger feedback systems
- Feedback-Control Systems: system receives feedback, and adapts based on
it.
- Feedback-Analysis Component: allows system to evaluate feedback and
respond accordingly.
- Metric Estimation Component: assists in measurement of factors on which
system adaptation may be based.
- Jade: Java framework for auto-detection and replacement of component
failures
- Jade Server Interface: used to call methods implemented by a component
- Jade Client Interface: used by a component to call other components
- Binding Interfaces: components bound to each other by means of shared
interface.
- Jade Repair Management Domain: consists of servers, their components to
execute, a node manager that can add / remove components, and failure
detectors.
DEFINITIONS -- Chapter 3: Processes
- Process Table: maintained by the OS for each running process.
Consists of CPU register values and all other information needed to swap
process into and out of memory.
- PROCESS: a program in execution (IMPORTANT!!)
- Thread Context: minimum information required about each thread, for CPU to
be able to manage the running of multiple concurrent threads.
- Lightweight Process (LWP): Less expensive than having completely separate
processes, as several LWP's can run within a single process, but LWP's run
withing the Kernel space rather than user space.
- Scheduler Activations: Like LWP, but when thread blocks, the Kernel
directly calls the thread package via "upcall."
- Dispatcher: In multithreaded server, a "master thread" which reads
incoming requests.
- Worker Thread: the dispatcher gives requests to a worker thread after it
comes in.
- Resource Virtualization: Allowing programs to behave as if multiple
instances of a resource exist.
- Four Levels of Interface in Computer Systems:
- Machine Instructions available to any program
- Privileged Instructions available to OS and specific programs only
- System Calls offered by operating system
- Application Programming Interface (API) available to calling programs.
- Process Virtual Machine: VM exists only for a certain single
process.
- Virtual Machine Monitor (VMM): a layer over the machine hardware,
providing the hardware interface but possibly adding security, reliability,
etc. Can provide interface to multiple callers simultaneously.
- X-Windows: GUI for bit-mapped terminals
- X Kernel: Core of the system; has terminal-specific drivers
- Xlib: allows applications to capture mouse and keyboard events.
Resides on application server.
- X Protocol: application-level protocol that allows Xlib on application
server to communicate with X kernel on client.
- Window Manager: provides "look and feel" to user.
- Compound Document: a collection of documents of various types, which
appear to the user as a single document, as if they were simply different
sections on the same page.
- Iterative Server: server gets request, then responds, then waits for next
request
- Concurrent Server: server gets request, passes to worker thread, then
waits for next request.
- End Point / Port: communication logical end-point for client/server
communication.
- Superserver: a process which can listen to many ports simultaneously, and
delegates each request to the appropriate handling process. Example:
"inetd" in Unix.
- Out-of-Band Data: data that must be processed before any other data from
this client (highest priority). For example, allows client to override
upload request with cancel command.
- Servers and State:
- Stateless Server: does not keep information on state of clients (web
servers)
- Soft State: server maintains some client state information for a limited
time
- Stateful Server: maintains information on client state.
- Session State: state maintained for only the current session
- Cookie: small data file with client-specific information, stored on
client and sent back to server with each subsequent request.
- Server Clusters: collection of machines offering various services.
- Transport-Layer Switches: accepts incoming TCP connections, and
hands off request to one server in the cluster.
- TCP Handoff: the process of handing control from switch to server.
- Domain Name System (DNS): can provide multiple access points to a
cluster for fault tolerance.
- Distributed Server: appears to outside world as fixed access point, but
can be reconfigured internally as needed.
- Home Network: Normal access point
- Home Address (HoA)
- Home Agent: Special router which forwards traffic when server
migrates.
- Care-Of Address (CoA): temporary address to which home agent will
forward requests.
- Contact Address: Stable address that can be used by the outside world.
- PlanetLab: Cluster Server Example, where different organizations
contribute computers to the cluster
- Vserver: a separate Linux environment where a group of processes run,
but cannot share resources across vservers. Allows isolation of
software environments to support "test-labs."
- Slice: a set of vservers.
- Node Manager: a separate vserver which creates other vservers on its
node, and controls resource allocation.
- Service Provider: resource provider which owns the slice.
- Slice-Creation Service (SCS): can be run by node to create slice
- Slice Authority: can request creation of slice
- Process Migration: moving an entire process from one machine to another
- Mobile Agent: small mobile program that can replicate and run on several
sites at once, combining the results from each instance, and returning those
results to the user.
- Weak Mobility: the code and initialization data can be transferred, but
the program must be restarted from a prescribed starting point.
- Strong Mobility: the process can be stopped, moved to another machine,
and restarted from where it left off.
- Sender-Initiated Migration: sending machine initiates action.
Example: uploading data to compute server
- Receiver-Initiated Migration: receiving machine initiates action.
Example: Java applet.
- Process-to-Resource Bindings
- Binding by Identifier: Strongest. Binding to specific URL.
- Binding by Value: using standard libraries which are locally
available, but exact location may differ from site to site.
- Binding by Type: Weakest. Binding to local device like monitor,
printer, etc.
- Resource-to-Machine Bindings:
- Unattached Resources: can be freely moved between different machines
- Fastened Resources: not dependent on local machine, but higher cost to
move. Example: local database or web site stored on host.
- Fixed Resources: closely bound to specific machine. Example:
local devices
DEFINITIONS -- Chapter 4: Communication
- Open Systems Interconnection (OSI) Reference Model:
- Application Layer
- Presentation Layer
- Session Layer
- Transport Layer
- Network Layer
- Data Link Layer
- Physical Layer
- Protocol: rules governing the format, contents, and meaning of
messages
- Connection-Oriented: Sender / receiver first establish a
connection before exchanging data, then explicitly disconnect
- Connectionless: no advance setup before communication.
- Header: Each layer adds to the message with information specific to that
layer.
- Protocol Suite / Stack: Collection of protocols used in a system.
- Frames: units of bits sent by lower-level protocols
- Checksum: bit pattern to detect if error occurred in transmission
- Routing: the task of choosing the best path from sender to receiver
- Well-known protocols:
- Internet Protocol (IP): connectionless protocol; most widely used on
Internet.
- Transmission Control Protocol (TCP): connection-oriented protocol, but
connections are not used in TCP/IP.
- Universal Datagram Protocol (UDP): connectionless; IP with minor
additions.
- Real-Time Transport Protocol (RTP): specifies packet formats but doesn't
guarantee delivery
- File Transfer Protocol (FTP): protocol to transfer files between client
and server
- HyperText Transfer Protocol (HTTP): handle transfer of web pages; is
also used in other applications
- Types of Communication:
- Persistent Communication: message is stored by middleware until delivery
to receiver. Receiver need not be up when message was originally sent.
- Transient Communication: message is stored only while sending and
receiving programs are executing
- Asynchronous Communication: sender doesn't block after sending message
- Synchronous Communication: sender blocks after sending message; waits
for response
- Remote Procedure Call (RPC):
- Call-by-Value: copied directly to the stack
- Call-by-Reference: only address is copied to the stack
- Call by Copy/Restore: (not in C) copied to the stack, and then copied
back ("in/out" parameters)
- Client Stub: version of the procedure in the client library.
Packages parameters into a message, and requests transmission to server.
- Server Stub: Receives requests from client stubs; unpacks parameters;
calls server procedure; returns result to caller.
- RPC Steps:
- Client calls stub
- Client stub builds message and calls local OS
- Client OS send message to remote OS
- Remote OS passes message to server stub.
- Server stub unpacks and calls server.
- Server executes the procedure, returns results to server stub
- Stub build return message, calls its local OS
- Server OS passes return message to client OS
- Client OS passes message to client stub
- Client stub unpacks and returns result to client
- Parameter Marshaling: Process of packing parameters into a message
- Little Endian: Intel word format.
- Big Endian: SPARC word format
- Asynchronous RPCs: client continues after issuing RPC call; server only
acknowledges receipt of request at first.
- Deferred Synchronous RPC: combination of two asynchronous RPC calls
- One-Way RPCs: client doesn't even wait for server acknowledgement of
receipt before continuing, Less reliable, since not known that server
received request.
- Distributed Computing Environment (DCE): one middleware system
facilitating RPC
- Distributed File Service: worldwide file system; access any file in
the same way
- Directory Service: track location of all resources
- Security Service: protects all resources uniformly
- Distributed Time Service: synchronize clocks throughout the system.
- DCE Daemon: process that maintains servers and end-points for the host
on which it resides.
- At-Most-Once Operation: guarantees no operation is carried out more
than once.
- Message Oriented Communication
- Berkeley Sockets Interface
- Socket: communication endpoint for read/write.
- Socket Operations:
- bind (attach local address / port to socket)
- listen (willing to accept connection)
- accept (block waiting for incoming request)
- connect (establish a connection)
- send
- receive
- close
- Message Passing Interface (MPI): designed for parallel applications,
more flexible to handle different forms of buffering, additional protocols.
- Message-Queuing Systems / Message-Oriented Middleware (MOM): store
messages intermediately until receiver is available to receive.
- Source Queue: messages added to this queue
- Destination Queue: messages transferred to this queue, where they'll
be removed for processing
- Queue Manager: interacts with the application sending or receiving a
queue message
- Relay: forward incoming messages to other queue managers
- Message Broker: converts incoming messages to the format required by
the destination application.
- IBM WebSphere MQ:
- Message Channels: One-way, reliable connection between sending and
receiving queue manager.
- Message Channel Agent (MCA): one MCA manages each end of a message
channel.
- Local Alias: alternate logical name for queue manager
- Message Queue Interface (MQI): MQopen, MQclose, MQput, MQget
- Channel Control Function: manages two MCAs at the endpoints of a
channel
- Stream-Oriented Communication
- Continuous Representation Media: Time relationships between data
elements is important (streams of video, audio)
- Discrete Representation Media: Time relationships are not important
(text, images)
- Data Streams: sequence of data units
- Asynchronous Transmission Mode: data items are transmitted one after
another, no timing constraints. Normal transmission mode for discrete
data.
- Synchronous Transmission Mode: has maximum end-to-end delay
- Isochronous Transmission Mode: Data units must be transferred on time,
so there is maximum and minimum end-to-end delay.
- Simple Stream: single sequence of data
- Complex Stream: several related streams (substreams)
- Quality-of-Service (QoS) Requirements: may include...
- Required bit rate for transport
- Maximum delay to setup of session, for starting to send data
- Maximum end-to-end delay
- Maximum delay variance ("jitter")
- Maximum round-trip delay
- Differentiated Services: Sender can tag outgoing packets with different
priority:
- Expedited Forwarding: tells routers to give higher priority
- Assured Forwarding: defines range of priorities
- Forward Error Correction (FEC): Attempt to correct transmission errors
without requesting retransmission.
- Motion Picture Experts Group (MPEG): formed standards for compression of
video and audio.
- Multicasting
- Forwarder: If a node receives a multicast join request for the first
time, it becomes a forwarder and continues to forward the join request.
- Link Stress: measures how often a packet crosses the same link during a
multicast.
- Stretch / Relative Delay Penalty (RDP): radio in delay between two nodes
in overlay for multicast, vs. the actual delay between those same nodes in
the underlying network.
- Tree Cost: aggregate of all link costs for the entire tree
- Rendezvous Node: A node that tracks which other nodes have joined the
multicast tree.
- Switch-Trees: As optimization, node can switch the parent of its tree.
- Gossip-Based Data Dissemination
- Epidemic Protocols: Rapidly spread information through many nodes
using only local information.
- Infected: a node that is willing to spread its data to other nodes
- Susceptible: A node that has not yet received this data
- Removed: A note that is unable to spread data.
- Anti-Entropy: pick another node at random, either (i) push updates
only, (ii) pull updates only, or (iii) push and pull updates
- Rumor Spreading / Gossiping: use anti-entropy approach to spread
information around the tree
- Directional Gossiping: take the network topology into account when
building the gossip network.
- Death Certificates: attempt to solve the problem of deleting data in the
epidemic-algorithm approach, where it is difficult to distinguish between
removed data vs. not-yet-propagated data.
DEFINITIONS -- Chapter 5: Naming
- Location-Independent: a name of an entity is distinct from its location or
address.
- Identifiers:
- ...refer to at most one entity
- ...uniquely identify an entity (only one identifier per entity)
- ...always refer to the same entity
- Address-Resolution Protocol (ARP): finds the data-link address of a
machine, given its IP address.
- SSP Chain: During RMI, use pairs (client stub, server stub) and forward to
the actual object
- Finger Table: In the Chord DHT system, a lookup table maintained on each
node which provides shortcuts to a number of existing nodes, for faster access
to those nodes.
- Topology-Based Assignment of Node Identifiers: Assign identifiers so that
nodes with true network proximity also have identifiers which are close to
each other.
- Proximity Routing: Nodes maintain a list of multiple alternatives
for forwarding requests. More efficiency and better failure handing.
- Proximity Neighbor Selection: Choose the physically nearest node as
neighbor in the DHT.
- Iterative Lookup: Node looks up key, returns the next node found, and
original process forwards the request to that new node.
- Recursive Lookup: Node looks up key, then it forwards request to the next
node, which may continue the forwarding. The original process only
submitted the request once, to a single node.
- Domains: divisions of network hierarchy
- Leaf Domain: Lowest level domain, a LAN in a computer network or local
cell in a mobile phone network.
- Root Directory Node: top-level node.
- Location Record: Represents address of a single entity.
- Name Space: organization of names into labeled graph
- Leaf Node: named entity in the namespace. Lowest level structure on
the name tree / graph.
- Directory Node: Intermediate structure / node in the tree. Has leaf
node(s) below it, and possibly additional levels of directory nodes below it
as well.
- Directory Table: stores list of a directory node's outgoing names (names
on the next level down on the tree from this directory node)
- Path Name: list of names referring to the sequence of nodes required to
reach the name of our resources
- Absolute Path Name: first node in the path represents the root
- Relative Path Name: Path is relative to another location. First
path is not the root.
- Global Name: denotes the same entity, regardless of where that name is
used system-wide.
- Local Name: meaning depends on context of the system in which that name
is used.
- Name Resolution: the process of looking up a name
- Closure Mechanism: means for interpreting / resolving names
- Alias: an alternate name for the same entity
- Hard Link: Allow two absolute path names to refer to the same entity
- Symbolic Link: A leaf node stores the absolute path to a resource.
- Mount Point: directory node stores the identifier (and link) to a
different name space (different disk in the file system, for example)
- Mounting Point: The "target" location; the root of the destination name
space
- Name Space Distribution:
- Global Layer: highest-level nodes; rarely changed.
- Administrational Layer: organized by department or organization, for
example
- Managerial Layer: can change frequently. Hosts in the local
network, shared files
- Name resolver: ensures name resolution process is carried out.
- Iterative Name Resolution: Contact root name server for top level path,
return to requestor, who then requests next level, etc.
- Recursive Name Resolution: Contact root name server, which then passes
top-level intermediate result to next level. Only final result of name
resolution is returned to requestor.
- Domain Name Service (DNS)
- Domain: subtree
- Domain Name: path to the root
- Resource Records: contents of a node in DNS
- Canonical Name: primary name of a resource, as opposed to aliases.
- Zone Transfer: process of transferring updates from primary to secondary
DNS servers
- Zipf-like: refers to a property of DNS queries, in which the
frequency of the n-th most popular query is proportional to 1/nx
where x is close to 1. Named after Harvard linguist George Zipf who
found this same property with word-use frequencies.
- Directory Service / Attribute-Based Naming: Naming system returns only
specific entities that match the user's requested attribute (name, value)
criteria.
- Resource Description Framework (RDF): Resources are described using
(subject, predicate, object) triplets.
- Lightweight Directory Access Protocol (LDAP): combines structured naming
with attribute based naming. used by Microsoft Active Directory
- Directory Information Base (DIB): Collection of all entries. Each
record is uniquely named.
- Relative Distinguished Name (RDN): Refers to one element in the sequence
of naming attributes.
- Directory Information Tree (DIT): The hierarchy created by the
collection of unique names composed of all sequences of RDN's.
Determines the naming graph for LDAP.
- Directory Service Agent (DSA): One of many partitions of a large
DIT, which become distributed across several servers. Each DSA is a
zone of DNS.
- Directory User Agent (DUA): Client interface to directory.
Exchanges information with a DSA per established protocol.
- Universal Directory and Discovery Integration (UDDI): Attribute-based
naming to support web services.
DEFINITIONS -- Chapter 6: Synchronization
- Clock Synchronization:
- Timer: quartz crystal under tension, oscillates at well-defined frequency
- Counter: each oscillation decrements the counter
- Holding Register: when counter reaches zero, counter is reset from holding
register, and interrupt is generated.
- Clock Tick: each interrupt generated in this way.
- Clock Skew: inconsistencies between clocks on different CPU's.
- Measuring Time
- Transit of the Sun: The sun's reaching of its highest point in the sky
each day.
- Solar Day: the elapsed time from one transit of the sun to the next
- Solar Second: 1/86400th of a solar day
- Mean Solar Second: Average length of solar second, since the solar day can
vary over time, and is becoming longer over large periods of time.
- International Atomic Time (TAI): Cesium-133 atom's transitions are
counted. Maintained in Paris. January 1, 1958 was the first day of
atomic time tracked in this way.
- Leap Seconds: are introduced whenever the discrepancy between TAI and the
solar time reaches 800ms.
- Universal Coordinated Time (UTC): basis of all civil timekeeping.
- Global Positioning System (GPS):
- satellite-based system launched in 1978;
- has 29 satellites circulating the earth, each with up to four atomic
clocks calibrated from earth.
- Each ground receiver can compute its position using signals from three
satellites.
- Assumes that clocks are synchronized between the GPS satellites.
- Network Time Protocol (NTP): one computer probes another to gain the
current time; applies an estimation of the transmission delay.
- Reference Clock: fully-trusted accurate clock
- Stratum-1 Server: gets time directly from a reference clock
- Stratum-N Server: gets time from a Stratum (N-1) Server
- Berkeley Algorithm:
- periodically, time server asks all other machines for the time.
- computes the average time from all machines
- instructs other machines to adjust their system clocks to match this
average time.
- provides no guarantee that the clock will match an external standard, but
will provide consistent time within the system.
- Reference Broadcast Synchronization (RBS):
- useful in wireless / sensor networks where resources are constrained and
power consumption is an issue
- sender broadcasts message with time
- assume no contention to access network, and no multi-hop routing
- thus, almost constant transmission time
- receiver sends its clock value back to server, which stores offset.
Possible that no clocks are adjusted, but offsets are recorded.
- Logical Clocks: agreement on the order in which events have occurred
within a system, without necessarily knowing the exact time that events
occurred
- Lamport's "Happens-Before" Relationship: "a --> b" means that a is
known to have happened before b.
- Concurrent: nothing can be said about the order in which events occurred.
- Totally-Ordered Multicast: multicast in which all messages are delivered
in the same order to each receiver.
- Vector Clocks: stronger than logical clock. "Implies causality."
If event a precedes b on the vector clock, then the occurrence of b somehow
depended on event a.
- Causally-Ordered Multicasting: Weaker than "totally ordered".
In this case, only causally-related events must be properly ordered.
Unrelated events can be delivered in any order.
- Mutual Exclusion:
- Token-Based Solutions: There is one token available; if you have the
token, you can access the shared resource. If you don't need the token,
pass it on.
- Avoid Starvation: a condition where some processes never get a chance to
access the resource.
- Avoid Deadlock: where two or more processes are waiting for each other
- Permission-Based Approach: to gain access, you must get permission of
other processes
- Centralized: One process is coordinator. Others request permission
from it. (single point of failure)
- Decentralized: Voting approach. Request from all others; need
approval from majority. (if a peer fails and restarts during voting, it
could approve more than one requestor)
- Distributed: Request includes the current time. For competing
requests, accept the one with the earliest time. (Need approval from all
others; so multiple points of failure.)
- Token Ring: token is passed among nodes. Access the resource only
when you have the token. (Lost token when process crashes?)
- Elections:
- Bully Algorithm: When process coordinator is no longer available, elect a
new coordinator
- Processes are numbered sequentially
- P sends ELECTION message to all higher-numbered processes
- If no higher-numbered process responds, then P becomes the coordinator.
- If any higher-numbered process responds, P backs off and will not be the
coordinator.
- Highest-numbered responder is the "bully" and unilaterally declares itself
the coordinator.
- Ring Algorithm:
- Nodes are ordered. Each knows its successor
- Send ELECTION message to successor. Include your own process number.
- When receiving ELECTION message, add your process number to the list and
pass it on.
- If we receive ELECTION message with our own process number, we declare
ourselves the COORDINATOR -- send COORDINATOR message with the same list of
process IDs and showing our process ID as the coordinator.
- Wireless Environment: want to choose the "best" candidate.
- Nodes know their "neighbors" in proximity.
- Send ELECTION message to neighbors. Wait for their response.
- If receiving ELECTION message, do I have a parent? If not, the
sender becomes my parent, and I forward to all my neighbors, waiting for a
response before I respond to sender.
- If I already have a parent and receive a message, I acknowledge sender
immediately, and it knows I already have a parent.
- Response includes information about the resource, battery lifetime, etc.,
to help parent choose the best candidate to be coordinator.
- Superpeers in peer-to-peer networks: many nodes which can be selected as
coordinators
- Normal nodes should have low latency to superpeers
- Evenly distribute throughout network
- Predefined portion of superpeers relative to total number of nodes.
- Each should not server more than a fixed number of normal nodes.
DEFINITIONS -- Chapter 7: Consistency and
Replication
- Replicate for reliability and performance
- Cache: local stored copy of previously fetched data
- Cache hit: finding data in the local cache
- Data Store: any data storage through memory, database, file system, etc.
- Consistency Model: a contract between processes and the data store.
If processes follow the rules, the data store will remain consistent.
Applies to data set.
- Coherence Model: a contract similar to consistency model, but applies to a
single data item.
- Continuous Consistency Ranges: Three ways to define consistency between
replicas of data:
- Differences in numerical values
- Differences in staleness between replicas
- Differences in the ordering of update operations.
- Sequential Consistency: All concurrent processes see reads / writes of
data happening in the same order (possibly on different replicas of data)
- Causal Consistency: Reads / writes which are related will happen in the
same order. (Weaker version of sequential consistency)
- Concurrent operations: operations that are not causally related
- Conit: The unit over which consistency will be measured.
- Synchronization Variables: acquire them when entering critical section;
release when leaving
- Entry Consistency: Acquire variable's lock before writing to it.
- Write-Write Conflicts: Two operations want to write the same data.
- Read-Write Conflicts: One process wants to write while another is reading.
Often acceptable for reader to see old data while write happens.
- Eventual Consistency: There may be lag between different replicas, but
they eventually become consistent if updates cease.
- Client-Centric Consistency: Consistency guarantees for a single client,
making no guarantees between clients.
- Monotonic-Read Consistency: If a process reads a data item, subsequent
reads are guaranteed to get the same result or a more recent result (never an
older result)
- Monotonic-Write Consistency: Each write of a data item is completed before
the next write by the same process takes place. (Write operations are
complete, and consecutive.)
- Read-Your-Writes Consistency: A write will always be seen by a following
read by the same process.
- Writes-Follow-Reads Consistency: Write is guaranteed to happen on a
version of the data at least as recent as the latest read.
- Autonomous System (AS): network under a single administration and running
the same routing protocol
- Flash Crowds: a burst of requests to a specific site
- Mirroring: a web site copied to a number of servers ("Mirror Sites")
- Invalidation Protocols: Notify other replicas that an update has taken
place, so current data is no longer valid.
- Push-Based Approach / Server-Based Protocol: servers receive updates
without asking
- Pull-Based Approach / Client-Based Protocol: clients must request updates
- Lease: Server's promise to push updates to the client for a fixed
duration, after which client will need to request further updates or renew
lease.
DEFINITIONS -- Chapter 8: Fault Tolerance
- Availability: system is ready to be used immediately
- Reliability: system can run continuously without failure
- Safety: if system fails, nothing catastrophic happens
- Maintainability: failed system can be repaired easily
- Failure: system cannot meet its requirements
- Error: part of system's state that may lead to failure
- Fault: cause of an error
- Fault Tolerance: a system can provide services even in the presence of
faults
- Transient Faults: occur once and then disappear (can be difficult to
reproduce)
- Intermittent Fault: occurs, vanishes "on its own," then reappears (e.g.
bad connection)
- Permanent Fault: exists until fixed
- Crash Failure: server halts, but was working correctly until the crash
- Omission Failure: server doesn't respond to requests. Can be
"receive omission" or "send omission".
- Timing Failure: server responds but outside time interval (too soon or too
late)
- Response Failure: server's response is incorrect
- Value Failure: server replies with bad value
- State Transition Failure: server responds with incorrect actions
following a request
- Arbitrary Failure / Byzantine Failures: Arbitrary responses
- Fail-Stop Failures: Server stops producing output. Clients can
detect the failure. Maybe even gives advance warning.
- Fail-Silent: No warning or notification given. Clients must detect
that server is failing.
- Fail-Safe: Server gives arbitrary failures, but clients can detect that
responses are bad.
- Triple Modular Redundancy (TMR): signals are replicated three times.
Need two identical ones to propagate.
- Group Server: Process join groups, managed by this central server.
- k Fault Tolerant: System can survive faults in k components and still meet
requirements.
- Byzantine agreement problem: In the presence of faulty servers providing
faulty values, need to get agreement by majority on the values passed.
- Reliable RPC:
- At-Least-Once Semantics: If server crashes during RPC, retry to this or
a different server until one reply is received. (but server may have
executed multiple times)
- At Most Once: give up immediately and report failure (guarantees server
has executed no more than once)
- Exactly-Once: what we want, but very difficult or impossible to achieve
- Idempotent: operation can be executed multiple times safely
- Orphan: server process / computation to be returned to the client, but the
client has crashed and is no longer available
- Orphan Extermination: when client reboots, it first checks log file to see
if any open orphans are present, and kills them off if so.
- Grandorphans: Orphans which do additional RPC's. When they are
killed, their RPC's are left hanging as grandorphans.
- Reincarnation: rebooting client broadcasts beginning of new
client-specific "epoch;" all requests from that client with past epochs should
be killed by all servers.
- Gentle Reincarnation: Server checks for the client's old requests, but
tries to locate owner before killing them.
- Expiration: Each RPC is given standard amount of time to do the job.
(...but what's a reasonable time limit?)
- Feedback Suppression: reduce feedback to clients, to support scalability
- Scalable Reliable Multicasting (SRM) protocol: receivers never acknowledge
successful delivery of multicast, but only when they're missing a message.
(not always easy to detect this!) These requests-for-resent happen with
random delay, so another receiver might request first so my request is
unnecessary and saves bandwidth.
- Group View: the processes participating in a multicast group, which may
need to be an atomic-multicast group.
- View Change: A multicast message announcing a process joining or leaving
a group.
- Virtually Synchronous: If sender crashes during multicast, then message is
delivered to all in group, or ignored by all.
- Message Ordering:
- Reliable Unordered Multicast: No guarantees about ordered delivery of
messages to senders.
- Reliable FIFO-Ordered Multicast: Must deliver messages from the same
sender in order in which they were sent.
- Reliable Causally-Ordered Multicast: must preserve order only for
related messages.
- Total-Ordered Delivery: Messages are delivered in the same order to all
group members (not distinct from the above three)
- Stable: property of a message that has definitely been received by all
members of the group.
- Flush Message: indication that there should be no outstanding unstable
messages.
- Distributed Commit:
- One-Phase Commit: coordinator tells participants to perform the
operation (no way to tell the coordinator if operation can't be performed)
- Two-Phase Commit:
- Coordinator sends VOTE_REQUEST to all
- Participant returns VOTE_COMMIT for yes, or VOTE_ABORT if it can't do
the operation.
- Coordinator needs to see all VOTE_COMMIT, then sends GLOBAL_COMMIT,
else GLOBAL_ABORT.
- Participants act when they see GLOBAL_COMMIT, otherwise they abort.
- This is Blocking Commit protocol: If coordinator crashes, then
participants cannot decide what to do.
- Three-Phase Commit:
- Coordinator sends VOTE_REQUEST
- Participants returns VOTE_COMMIT or VOTE_ABORT.
- Coordinator sends PREPARE_COMMIT or GLOBAL_ABORT, gets
acknowledgement.
- Participants can contact each other if coordinator goes down.
- Recovery:
- Backward Recovery: Bring system from present state back to previously
correct state.
- Checkpoint: a previous correct state that was saved
- Message Logging: After a checkpoint is restored, you can replay events
from your event log to get back to a later correct state. (Sender-based
logging or receiver-based logging)
- Forward Recovery: attempt to bring system to correct errors and reach a
new correct state (but how to be sure you can correct any errors?)
- Erasure Correction: Missing packet constructed from previous
successfully delivered packets, if there is some redundancy in the
transmission.
- Stable Storage: designed to survive most events, except major physical
damage. Redundancy in disks
- Distributed Snapshot: Checkpointing across a distributed system.
- Recovery Line: Most recent distributed snapshot.
- Domino Effect: may be necessary to roll back to long-ago snapshot,
because snapshots of each independent system may not form a consistent
distributed system when considered as a whole.
- Independent Checkpointing: Processes take independent checkpoints.
- Coordinated Checkpointing: All processes jointly write their state, so
we know the checkpoints will be consistent.
- Incremental Snapshot: Coordinator only needs new checkpoints taken by
those processes who have received messages since the last checkpoint.
- Piecewise Deterministic Model: Time intervals defined by sequences of
deterministic events. From time of message receipt to time that
outgoing message is sent might determine a time slice.
- Stable message: can no longer be lost; it has been written to permanent
storage through logging, etc.
- Pessimistic Logging Protocol: non-stable messages can be delivered to at
most one process, which will immediately log it (but could go down before
logging). Process cannot retransmit message until after logging.
- Optimistic Logging Protocol: If process crashes after receiving a
message that hasn't been saved, it must recover to an earlier state prior to
its receipt of that message.
- Recovery-Oriented Computing: Reboot the system (or just the part of the
system that is failing, if you can identify that part, and if it's safe to
just reboot that part). Cheaper to just handle failure cleanly and
quickly, than to prevent failure in the first place.
DEFINITIONS -- Chapter 9: Security
- Confidentiality: information is disclosed only to authorized parties
- Integrity: alterations to the system can only be made in authorized ways
- Security Threats:
- Interception: unauthorized parties get access to a service or data
- Interruption: file is corrupted or lost
- Modifications: unauthorized changing of data or tampering with a service
- Fabrication: additional data or activity are generated that should not
exist
- Security Policy: describes what actions are allowed and which are
prohibited.
- Security Mechanisms:
- Encryption: transforms data into a form that unauthorized persons cannot
understand.
- Authentication: verify identity of user or other entity
- Authorization: given identity of a user, determine what actions are
permitted for that user
- Auditing: trace which clients accessed what resources
- User Proxy: A process that can act on behalf of a user for a certain
period of time
- Resource Proxy: a process that can translate global operations into a
local operation for that resource, that complies with the local domain
security policy.
- Switched Multi-Megabit Data Service (SMDS): backbone to connect different
LANs at different sites.
- Secure Sockets Layer (SSL): can securely send messages across TCP
connection
- Trusted Computing Base (TCB): Set of all security mechanisms in a computer
needed to enforce a security policy. These mechanisms must be trusted,
since they're "above" the security policy.
- Reduced Interfaces for Secure System Components (RISSC): all
security-critical services are on separate machines, which are isolated from
user systems. Low-level secure network interfaces protect these
machines.
- Plaintext: unencrypted message
- Ciphertext: encrypted message
- Symmetric Cryptosystem: same key is used to encrypt and decrypt a message
- Asymmetric Cryptosystem / Public Key Systems: keys for encryption and
decryption are different, but the pair is unique. One key in the pair is
private, and the other is made public.
- I hold a private key, but publish my public key. Send a message to
me by encrypting with my public key. Only I can decrypt it using my
private key.
- ...or, encrypt a message with your private key and send to me. If
I can decrypt with your public key, then I know the message came from you.
- Hash Functions: produces bit string based on the message content.
- is a One-Way Function: can compute hash value for a string, but can't
easily go backward to get the string that produced the hash value.
- Weak Collision Resistance: given an input string m, the hash function
should make it very unlikely that a second string can generate the same hash
value as m.
- Strong Collision Resistance: it's very difficult to even find any two
possible strings that generate the same hash value.
- Data Encryption Standard (DES): operates on 64-bit blocks, encrypts in 16
rounds, each using 48 bits of the 56-bit master key.
- Triple DES: an enhancement to DES that strengthens its security.
- Rivest, Shamir, Adleman (RSA): based on prime factorization.
- MD5: hash function to compute 128-bit "message digest" from an
arbitrary-size message.
- Secure Channel: protects senders against interception, modification,
fabrication.
- Session Key: shared key to encrypt messages during a session. Secret
and known only to parties in communication.
- Challenge-Response Protocols: One party challenges the other, response can
only be given by party having the shared secret key.
- Reflection Attack: Unauthorized person can trick the sender into
responding to his own challenge.
- Key Distribution Center (KDC): centralized approach. Centralized
service can issue secret keys to pairs of communicating hosts on demand
- Ticket: KDC can issue key to one host, which can issue ticket to the host
with which they want to communicate.
- Needham-Schroeder Authentication Protocol: protocol that calls KDC, gets
key to other party
- Nonce: a random number included in request, to ensure response matches
the intended request.
- Kerberos: based on Needham-Schroeder Authentication.
- Authentication Server (AS): handles login requests, provides a key to
setup secure channels
- Ticket Granting Service (TGS): Gives tickets to prove client's
identity to services.
- Secret Sharing: Multiple users or processes share a secret, but none of
them knows the entire secret. They need each other to have a complete
view.
- (m, n)-Threshold Scheme:
- Message is divided into n pieces ("shadows")
- Any m shadows can be used to construct the original message.
- Single Sign-On: One sign-on authenticates to multiple resources.
- Access Control: verifying access rights
- Authorization: granting access rights
- Reference Monitor: records which subject can do particular actions or
operations.
- Access Control Matrix: one access is users (subjects); the other
access is resources or operations (objects). This matrix / grid lists
whether each subject can access each particular object and/or perform each
operation.
- Access Control List (ACL): Each object maintains list of the users that
are authorized to access it.
- Capabilities: Each user is supplied with a list of capabilities that it
has, and provides to services to gain access.
- Certificate: subjects carry certificates indicating the groups they
belong to. Digital Certificate has protections against tampering.
- Protection Domain: set of (object, access right) pairs.
- Roles: users have roles, and roles are granted access rights rather than
individual users.
- Firewall: disconnects part of a distributed system from the outside world.
Incoming and outgoing packets are inspected before they pass through.
Firewall must be heavily protected.
- Packet-Filtering Gateway: a router; looks at header of packets,
including source / destination address and decides whether or not to allow
the packet through
- Application-Level Gateway: Looks at content of messages. Might
discard email over a certain size, for example.
- Proxy Gateway: All requests go through the proxy and are filtered.
- Protection against malicious code:
- Sandbox: downloaded code operates in controlled environment, can be
stopped.
- Example: Java Virtual Machine (JVM)
- Class Loader: gets a class from server or file, and installs into JVM
runtime space.
- Byte-Code Verifier: checks that instructions follow security rules.
- Security Manager: performs checks at runtime.
- Playground: separate machine designated for running downloaded code.
Other resources are physically separated from the mobile code.
- Extended Stack Introspection: When a method is called, first check
whether caller is authorized; if so, grant one-time privilege to call
method. (Need to check again if method is called again.)
- Denial-of-Service Attack (DoS): Maliciously preventing processes from
accessing authorized resources.
- Distributed Denial-of-Service Attack (DDoS): large group of processes
work to bring down a networked service. (Bandwidth depletion in the
network, or resource depletion on the server)
- Diffie-Hellman Key Exchange: Public-key cryptosystem.
- A, B agree on two large public numbers n, g.
- A picks large random number x, and keeps private. Similarly, B
picks large random number y, keeps private.
- A sends (g^x mod y) to B (publicly!!)
- B sends (g^y mod x) to A (publicly!!)
- A and B can compute (g^(xy) mod n), but nobody else can!!
- Cool !! I never knew how this worked. This is one of my best
takeaways from this class!
- Public-Key Certificates: Public key combined with string identifier giving
the entity (host, user, etc.) to which that public key is associated.
- Certification Authority: signs a certificate. Trusted to verify
authenticity of certificates.
- Trust Model: depends on the highest-level certification authorities being
trustworthy by everyone.
- Privacy Enhanced Mail (PEM): three-level trust model:
- Policy Certification Authorities (PCA): authenticates low-level
certification authorities.
- Internet Policy Registration Authority (IPRA): authenticates PCAs.
- depends on user trusting IPRA !
- Certificate Revocation List (CRL): published regularly by certification
authorities. Clients must check CRL's to validate that a certificate has
not been revoked.
- Capability: Represents a set of access rights that a user may have with
respect to a resource. 128-bit identifier.
- Server port: First 48 bits. Machine-independent identifier of the
object's server
- Owner Capability: Right 48-bits of capability. Server sets this
randomly. Client requests must match this field to be accepted.
- Attribute Certificate: Holds (attribute, value) pairs that apply to the
certificate's holder on a particular resource.
- Attribute Certification Authority: trusted authorities for attribute
certifications.
- Delegation of Rights: If user is making use of another resource to
do action on his behalf, that other resource may need some of the user's
rights. User can delegate rights to that resource.
- Proxy: Token which allows one entity to temporarily pass access rights
to another entity.
DEFINITIONS -- Chapter 10: Distributed
Object-Based Systems
- Object: Encapsulates data ("state"), and operations ("methods")
- Interface: makes methods available
- Distributed Object: the object resides on one machine, but an interface to
that object may exist on one or more remote machines.
- Binding to a Distributed Object: loading a local implementation of the
object's interface ("proxy") into the client's local address space.
- Skeleton: Server-side stub; allows server middleware to access the
user-defined objects.
- Remote Object: interface to an object that may be available ton many
machines. However, state of object is maintained on just one machine.
- Class: Description of an abstract data type in terms of data and
operations.
- Object Adapter: wrapper that goes around library code (especially from
non-OO languages) to make it appear as an object. implements specific
activation policy
- Persistent Object: May not exist in memory, but stored on disk.
- Transient Object: Exists only while server process is up.
- Enterprise Java Beans (EJBs): special Java object which may provide
services, and is hosted by a special server.
- Stateless Session Bean: invoked once, does work, discards all data
- Stateful Session Bean: maintains state during session, but loses state
when session done (electronic shopping cart)
- Entity Bean: long-lived persistent object, stored in database (customer
records)
- Message-Driven Bean: Part of publish/subscribe mechanism. Reacts
to incoming messages. Bean is called when certain type of message is
received.
- Activation Policy: how to invoke an object
- Implicit Binding: Client can invoke methods with only a reference to the
object
- Explicit Binding: Client must first call function to bind to the object,
then invoke methods.
- Location Server: keep track of host where services are actually running.
Provides object references to its clients, which identify location of a
service resource, and a global identifier for the resource.
- Implementation Handle: a proxy that a client can load when binding to an
object
- Remote Method Invocation (RMI): means for invoking methods on remote
objects
- Static Invocation: calling via predefined interface definitions
- Dynamic Invocation: create an invocation at runtime
- Serializable: data type that can be marshaled
DEFINITIONS -- Chapter 12: Distributed Web-Based
Systems
- World Wide Web (WWW): huge distributed system for accessing linked
documents and services
- Uniform Resource Locater (URL): reference to a document in web-based
system. Combines DNS name of server with local file path and file name.
Note that URL's are location-dependent
- Uniform Resource Identifier (URI): naming system for web
- Uniform Resource Name (URN): globally unique, location independent name
for resource.
- Scheme: specifies content / syntax for URI.
- Browser: special application allowing client to interact with web services
and access web documents
- HyperText Transfer Protocol (HTTP): standard for communication between
browser / web clients and web servers.
- Markup Language: Language similar to that of word-processing systems,
showing the document text annotated with the document features to be applied
to that text.
- "Head" Command: Returns header of document
- "Get" Command: Returns document
- "Put" Command: Store document
- "Post" Command: Provide data to be added to a document (used for form
input)
- "Delete" Command: Request deletion of a document
- Tags: character strings that are searchable via HTTP.
- Request Line: mandatory part of HTTP request; specifies operation that
client wants
- Status Line: first line of server response. Could be 200 OK, 400
Bad Request, 403 Forbidden, 404 Not Found, etc.
- HyperText Markup Language (HTML): The most widely used markup language on
the web.
- Extensible Markup Language (XML): more flexible than HTML. Allows
customization of markup fields, and allows writer to define new elements in
the markup language (a "meta-markup language").
- Embedded Documents: HTML and XML references to other documents of various
types.
- Multipurpose Internet Mail Exchange (MIME) Type: specifies the document
type of the embedded documents found in web pages.
- Common Gateway Interface (CGI): web server can execute a program with user
data, pass that data as standard input parameter or command-line parameters,
and generates an HTML document back to the server as its output.
- Server-Side Script: Code that is part of a fetched document, which must be
executed by the server before that document is returned to the requesting
client.
- Servlets: Java programs that are run by the server to maintain state and
implement other functionality.
- Web Services: "traditional service" available over the internet.
Follows standards that allow it to be discovered and accessed in a
standards-based way.
- Universal Description, Discovery, and Integration (UDDI): directory
service storing service names and descriptions. Allows web clients to
find available web services.
- Web Services Definition Language (WDSL): formal language that specifies
interfaces of a web service.
- Simple Object Access Protocol (SOAP): framework for standardizing
communication between two web processes.
- SOAP Envelope: contains header and body of a SOAP message.
- Conversational Exchange Style: two parties exchange structured documents
- RPC-Style Exchange: request/response behavior
- Complex Service: a compound service that is composed of several more
basic, and simpler, services.
- Composite Service: services which may combine functionality from different
providers and possibly different organization domains, but which need to
appear as a single coherent service to the user.
- Coordination Protocols: used for coordination between web services.
Forces the different parties to adhere to rules and take the right steps at
the right times.
- Web Services Coordination: a separate service for handling coordination
efforts
- Web Browser: client application that makes requests to web servers and
displaying the resulting documents to the client user.
- Plug-In: program that integrates with the browser to handle one or more
specific MIME types.
- Apache Portable Runtime (APR): part of Apache web server. Library
provides platform-independent file handling, networking, locking, threads,
etc.
- Hook: Placeholder / stub for standard functionality, such as writing to
log, translating URL to local filename, checking access rights, etc.
- Modules: implementation to support functionality specified in hooks.
- Content-Aware Request Distribution: web server looks at incoming request,
and decides where to transfer the request based on the specific content being
request.
- Round Robin DNS: Single domain name associated with multiple IP addresses.
DNS server can evenly distribute requests so that equal numbers of requesters
receive each of the available IP addresses corresponding to the domain name.
- BIND: one specific DNS server that handles Round Robin DNS
- Pipelining: Issuing several requests in a row (over the same connection)
without waiting for a response from earlier requests.
- Web Distributed Authoring and Versioning (WebDAV): allows locking of
shared document, and delete / create / copy / move documents on remote web
servers.
- Hierarchical Caches: caches over a geographical region, to reduce network
traffic (but add latency)
- Cooperative Caching / Distributed Caching: if a cache / proxy doesn't have
document, it checks other nearby caches for the document before requesting
from the web server.
- Content Delivery Networks (CDNs): web hosting services to distribute and
replicate web documents for multiple web sites.
- Multi-Protocol Web Switching (MPLS): use virtual circuits to improve
packet forwarding, allowing packets to flow in different paths than what is
specified in network routing tables. This can make it difficult to
evaluate the network for performance, latency, etc.
- Flash Crowd: sudden burst in requests for a certain web site.
- Flash Crowd Predictor: predicts upcoming flash crowd; goal is to provide
time for web servers to create additional replicas of web documents that are
likely to be heavily requested during the upcoming flash crowd.
- Adaptive Redirection Policies: The ability to redirect requests
dynamically, especially including the ability to stay away from heavily loaded
servers.
- Query Containment Check: when web applications are distributed
across multiple servers, an "edge server" (handling a client request) can
determine whether that client request can be entirely handled with the data
available to that server, or whether additional data or functionality is
needed from other services.
- Content-Blind Caching: edge server stores recent client requests and
results, so subsequent requests can first check to see if they hit the cache.
Otherwise, the query is forwarded. Simple, but can waste space in
redundant caching.
RESEARCH LINKS:
- 2008/06/01 (2 hours): Academic Cluster
Computing Initiative:
YouTube overview video at
http://www.youtube.com/watch?v=UBrDPRlplyo
This initiative ties our SE 435 course to some core issues of CS education
which are very interesting to me. The video highlights how quickly we've
come to depend on massive-scale distributed systems, such as the huge numbers
of connected computers to support Google and other web services. Cluster
computing is a key capability that allows such systems to exist, and this
requires today's computer scientists to be able to design systems which can
effectively mobilize the combined computing power of these clusters.
However, today's university computer science graduates -- even the best and
brightest, and even those from the top schools -- are not sufficiently trained
to think and design in such large scale terms. Google and IBM are
leading this effort to install a cluster computing environment -- along with
course offerings -- for students and faculty at the University of Washington.
See "Ideas" below for some of my follow-up thoughts on this...
- 2008/05/27 (30 min): TIBCO Rendezvous
Messaging Software: background from their website at
http://www.tibco.com/software/messaging/rendezvous/
TIBCO Rendezvous is a messaging package that I've used in a number of jobs and
projects for messaging. It supports a few different approaches to
messaging architecture, but I've used in (a) a publish/subscribe fashion where
the message is published via multicast, and (b) in a request/reply
architecture. (The publish/subscribe use of TIBCO Rendezvous is
described on page 595-596 of the textbook, in a section we didn't cover in
this course.
Under this system, hosts on the network run a process called the Rendezvous
Daemon ("RVD"), which registers all message subscribers. Whenever a
message arrives which has at least one subscriber, the daemon forwards it to
all subscribers. The daemon typically will only run whenever there is at
least one message subscriber. If there are no subscribers for a certain
period of time, the RVD will shut down.
- 2008/05/12 (1 hour): Kerberos
Authentication and Security Software: background from their official
website at http://web.mit.edu/Kerberos/
Kerberos is an authentication system using private key cryptography which is
used in some systems that I've worked with during my career, so I was
interested to look into this further. Kerberos is free software
available from MIT through the GNU Public License, very similar to the
licensing for free versions of Linux like FreeBSD.
Kerberos is based on the concept of a Key Distribution Center (KDC), which was
discussed in the text. This is a centralized approach to the generation
and distribution of security keys. The name is derived from the
three-headed watchdog in Greek mythology, which guarded the entrance to the
underworld.
- 2008/04/16 (1 hour) C++ and
Multi-Threaded Programming
Article on movement to add threads to C++ Standard:
http://www.artima.com/cppsource/threads_meeting.html
POSIX Threads Tutorial from Lawrence Livermore National Laboratory:
https://computing.llnl.gov/tutorials/pthreads
I work with C++ in my workplace much more often than with Java. I
haven't yet needed to implement multithreaded applications in C++, but I
expect I'll need to do this soon. Based on what we've done with
multithreaded programming in this course, I wanted to learn more about how we
can implement the same types of programs in C++.
Surprisingly, the C++ standard is currently silent on the topic of
multithreaded programming, and so there are many non-standard implementations
of threads currently in use in C++. The first link above gives some good
high-level ideas of what C++ threading code may look like when threads are
implemented in the new standard (coming within the next few years, I was told
when I attended a recent lecture by Herb Sutter of the C++ standardization
committee).
The second link above talks about one of the better known, if not the most
recent, implementation of threads for the C language. This actually was
standardized under the IEEE specification of the C language, which provided a
standard "pthreads" API which defined three types of operations
(1) Thread management
(2) Mutexes ("MUTual EXclusion"), methods for handling synchronization
(3) Conditionals, especially relating to communications between threads
IDEAS
- TEACHING IDEAS based on what I've learned in
SE 435: I was previously a high school computer science teacher, and I
expect that at some point in the future I will again be teaching in the
computer science classroom at some level, whether K-12, or collegiate, or in
corporate training. Therefore, I often try to think of classroom
activities that relate to various topics that we're studying. Below are
a few possible teaching ideas:
- Public Key Cryptography: Chapter 9 of the textbook convinced me
for the first time that secure communication is possible, since I had never
been able to understand how two remote computers on a public channel could
setup a secure communication that could almost never be decrypted by an
eavesdropper.
A classroom activity would have students write programs that follow a
(simplified) Diffie-Hellman Key Exchange algorithm, where each would pick a
private, large, random number, and that would be used to setup a secure
communication. Depending on the level of student, they could maybe
just write encoded text files (with encoding based on Diffie-Hellman
algorithm), or send over non-encrypted socket.
This would be a good activity to teach the basics of how secure
communication works. I would maybe teach it at the same time I'm
teaching the use of File I/O, or socket programming, etc.
- MORE THOUGHTS ON THE ACADEMIC CLUSTER COMPUTING
INITIATIVE (follow-up to Research Links above):
Above, I listed a research link to the Academic Cluster Computing Initiative,
where technology companies like Google and IBM are driving the effort to
incorporate cluster computing into the computer science curriculum for CS
students at the University of Washington. One of the main motivations
seems to be that since technology companies (especially successful
Internet-based companies) have scaled up their operations enormously in a
short time, universities have not been able to keep pace with the level of
preparation being given to their students.
I see this as one of the first signs of a growing and long-term problem:
the demands and personnel requirements of technology companies are growing
quickly. Some universities are too bureaucratic to keep pace with this
rapid technology growth. More significantly, universities are not
recruiting sufficient numbers of people into computer science to fill the
industry's demand. And much of this problem originates back in K-12
education, where students are not pursuing mathematics and science in strong
numbers, and computer science is often overlooked completely at this level.
As a result, I see technology companies and professional computing
organizations taking a more active role in education in the near future, first
at the university level, but later even extending their reach down to K-12.
Since it appears that our education system isn't providing enough talented
individuals with technology education and the inspiration to pursue a
technology-related career, I see technology companies taking matters into
their own hands more and more in the near future. This Academic Cluster
Computing Initiative is one way in which Google and IBM are taking a more
active approach -- I see this as a growing trend.