« Usenix 2004: Monday notesUsenix 2004: Wednesday notes »
Jun 30
Usenix 2004: Tuesday notes
Network Complexity: How Do I Manage All of This?
Eliot Lear (Cisco)
- Why is all this management stuff important?
- The entire world runs on IP now.
Status
Decent fault / performance monitoringAccounting tools are immatureConfiguration: “Everyone is in complete agreement that we’ve never been able to standardize”What’s changed lately?
Focus on security has resulted in more attention on network managementMore network elements to be managedNetworks are more complex (VoIP, VPNs, etc.)We rely on them moreDiscovery has been distrusted on security grounds
The bad guys can discover things anyway so make the network manageable.Most discovery protocols don’t scale to large-scale networks (>10^3 devices)The Standards Mess
“We are swimming in standards” - confusing vendors, customers, etc.We don’t have the right standards:
Sample mobile device needs config by user, VoIP provider, ISP, and corporate. Nothing standard supports this.Basic management has become widespread (SNMP, Syslog, etc.) Fault management
Ad-hoc approach. Nothing standard, users frequently confronted with inscrutable error messages, etc.Workflow management: fixing problems before stuff breaks (avoiding reactive management)Anomaly management: stop pretending that security management is unrelated - it’s not (spam, phishing, etc.)The more it touches your network, the more it costs youIDS
Requires IDS:Configuration:Audit feedback loop
Needs very mature, reliable tools (e.g. don’t accidentally disable the network) but what’s out there is still immatureMonitoring
Faults may masquerade as other classes (e.g. performance faults may look like security faults, etc.)
“A violation of the laws of physics is a sign of a problem”Configuration management
We need to hide information / details (car analogy - casual users aren’t swamped with details, allowing you to switch cars trivially)Unix has had 30 years and is far from perfect
POSIX works well if you stay away from hardware but networking doesn’t have the luxuryConsider the mess of simple problems like user accounts or config formatsStandardizing network management
XML is not a panaceaHTTP doesn’t work - we need locking, “phone home” abilityNETCONFXMLMultiple transports:
BEEPSSHData-model agnostic (i.e. “we couldn’t standardize that”)We need helpIdentity Management
Marginal at corporate level, dismal for consumersProblems:
Maze of different systemsIdentity theft, privacy concernsGoals
Standard interfaceUser-oriented control - you control your data, not some company with different goalsConclusions
Too much data is BAD
We need better filtering and aggregation techSome linkage is still missing for automationSecurity and Network management are very tightly relatedExpected more from your local routerWe’re awash in keys!Q&A
Sun’s DTrace: reprogrammable instrumentation is a big advanceDeploying Commercial DB on Linux (Harvard Room)
- 2.6 is good. Backports can take a lot of the pain out but it’s not the same
- IO latency is a big deal - maximize spindles rather than throughput
- Carefully monitor actual performance - big disk arrays are dicey just because it’s hard to tell where data actually lives.
NFS Deployment for High Performance
- NetApp has been pushing RDMA extensions for NFS (v4 standard?) - IETF lists have the details.
- Goals: make NFS as good or better than a local FS
- What really matters for client performance:
- Caching
- Wire efficiency
- Single mount point parallelism
- Multi-NIC scalability (bonding up to NIC vendor)
- Latency
- CPU cost (copies, additional TCP/IP overhead)
Tunings
Interconnect
NIC: hardware checksums, large-send offloadRoutingJumbo framesClient
Mount options
rsize/wsizeAttribute caching to ensure consistency:
Timeouts, noac (no attribute cache), nocto (cache timeout)llock - use local locking semantics to avoid implied non-cacheability for lock; useful for files which are only accessed from the current machineReadahead - disable for random i/obiod / rpc slot table (Linux) - per mount point, increase for busy clientsNetwork buffers
system/nfs default socket bufferssend/receive highwater limits - increase to something like 1/4 socket bufferServer
Use appliance (dedicated, tuned)Volume / spindle tuning
Balance file distributionVolume options
noatime updatessnapshots for backupsWar stories
App runs 50x slower on NFS than local: block-transfer verified that NFS files were bypassing host buffer cache because multithreaded client caused out-of-order writesFile locks - client caching varies widely based on lock type (disable, don’t prefetch, etc.)
Most clients lack control over thisOverzealous prefetch - random reads trigger tons of unnecessary requests (app reads much less than wire reads)Bad clients
Artificial operation size limits (e.g. 8KB per-write limit) - may require snifferLinux uses page-size chunks:
page size < r/wsize I/O may be splitpage size > r/wsize operations will be split and serializedNetApp released patches; 2.6 / NFSv4 client backports should be betterRPC slot limitation - max outstanding requests cripple throughput (Linux has 16-op-per-mount limit)
2.6 kernel: increase /proc/sys/sunrpc/(tcp|udp)slottable_entriesWriters block readersDatabases on NFS: http://www.netapp.com/tech_library/ftp/3322.pdfTools
NetApp I/O generator: SIOFuture
RDMA: no host CPU use, special interconnect (Infiniband, iWarp)In the trenches: Enterprise Wireless LANs
- Connect Server (auth, user management, config, monitoring, etc)
- Edge controller (policy enforcement, packet routing, service proxies)
Network design
RF Coverage analysis or self-tuning APsQoSMobilityQ&A
U Utah uses Meeting House 802.1x client on Windows, OS X < 10.3Presenter, several others use FreeBSD + HiFN crypto to deliver >80Mbs IPSec on < 1GHz systemsLinux on Opteron BoF
- Trivia: Opterons interleave memory. Fill every slot!
- AMD interested in tuning tools, helping programmers w/NUMA, etc.
LinuxBIOS BoF
- VIA, Octiga Bay, Linux Networks all shipping LinuxBIOS
- AMD very supportive (K8 support is great), IBM also good but less so
blog comments powered by Disqus