Announcements
2024-12-02 to 06 oracle
oracle maintenance.
2024-11-25 to 29 general maintenance
system maintenance.
2024-11-11 to 22 core switch
Upgrade of core network infrastructure
First sytem maintenance.
2024-08-12 to 16 domain controller
Upgrade domain controller.
Has to be rescheduled.
2024-08-05 to 09 network
network maintenance
2024-07-29 to 08-02 general maintenance
system maintenance. Including clusters, acc9, consoles, network, storage and domain controllers.
Bugzilla (
https://www-acc.gsi.de/bugzilla/) will be decommissioned.
complete.
2024-07-22 to 26 oracle
oracle maintenance. First Systems maintenances
2024-07-03 to 12 zks
zks vlan70 reconfiguration.
2024-07-01 to 05 acc6
acc6 maintenance complete.
NFS access has been removed. All home directories are local and empty.
If you want to copy files from acc9 to acc6 you need to configure
SshLegacy and copy them from acc9 using something like
scp -r ~/foldername asl735:~/
/common/home
/common/fesa
/common/fesadata
/common/tftp
/common/export
2024-01-08 to 01-12
maintenance week. closed.
2023-08-28 to 09-01
migration zks systems to rocky9
2023-07-31
das bisherige
graylog logging wird durch
opensearch ersetzt. Technischer Hintergrund sind Lizenzaenderungen und abgeschaffte Features von graylog 4.
Fuer Nutzer der Kommandozeile gibt es auf acc9 ein
opensearch tail --help
. Z.B.
opensearch tail --host fel0003
Das Webinterface findet sich unter
https://logging.acc.gsi.de
Das Webinterface ist ein generisches Interface fuer opensearch. Als solches ist es nicht ganz so einfach zu bedienen wie graylog. Eine Kurzanleitung steht in
DiagnosticLogging
Derzeit gibt es ein logshipping von graylog zu opensearch, d.h. Nachrichten die in graylog landen werden auch zu opensearch weitergeleitet und ihr koennt schon mal reinschauen.
Ab KW31 wird das Diagnostic Logging umgebaut und opensearch produktiv. Die Umstellung duerfte ingesamt etwas holprig werden, da recht wenige Dinge im Vorfeld getestet werden koennen und sich IP Adressen, Netzwerkmasken und DNS namen gleichzeitig aendern.
Vermutlich brauchen einige (Java-) Dienste und Frontends dafuer einen restart.
2023-07-24 to 28
virtualization acc7.
2023-07-24 to 28
maintenance week database.
2023-07-17 to 21
maintenance week. closed. Pending public tasks
2023 switch to Rocky Linux 9
Rocky Linux will be migrated to version 9
The rough roadmap for 2023
- in 2022
- Preparations by ACO-INN
- adapt eclipse plugin for websvcdev
- week 1/2023
- Migration of fsl00c
- Data migration of all nfs directories
- week 2/2023
- Regular maintenance week. Service interruption for all systems.
- Service interruption for frontend boot. home directories on all systems. . all user facing systems are complete
- week 3/2023:
- Database maintenance
- weeks 4-5/2023
- Power off acc8dev
- Setup new acc9dev cluster. Planned completion week 5. No data migration. (Home directories are nfs based and will survive)
- week 5/20223
- week 6/2023 and subsequent
- replace integration environment FAIR.Intern.SystemInt (el7 to el9). No data migration.
- replace acc8pro virtual machines. (order as required, INT first). No data migration. Service interruption of production services.
- asl151, asl152, asl156, asl157, asl158, asl159
- Activities in parallel
- reinstall tcl1000 console environments
- fesa release for acc9
- fesa buildhost asl551 upgrade to el9
- fesa sdk for yocto@acc9
- adapt eclipse plugin for newer eclipse version (2022-12)
- reinstall zks
- reinstall other systems (artifacts, archiving, interlock, logging, etc)
2022-09-26 upgrade storage
firmware upgrade on storage systems nwsr06m and nwsr07m. This is an online upgrade, nothing noticeable should happen.
complete.
2022-09-19 websvcdev
websvcdev will be switched to a new server. Access to the new webserver via webdav, see
ClusterAcc9
user data will not be migrated.
2022-08-24 upgrade ovirt
upgrade ovirt. Might affect all virtual machines that are not on vmware. acc8, artifacts, git, jenkins.
In theory everything happens live and nobody will notice. If something goes wrong, everyone will notice.
2022-08-15 vmware migration
downtime for acc6, virtual frontends/scu, integration system, ...
2022-08-08 Oracle Maintenance
Update OS, Firmware, Patches RDBMS. Databases should be online on one node of the cluster, but complete outage is possible.
2022-07-18 maintenance week.
server updates are mostly complete. tcl1000 upgrades will happen over the next days.
completed
- nfs migration (note: no support for nfs v2 see NfsServerRocky8Migration)
- artifacts migration (note: new urls see ArtifactRepository)
- acc7dev
- acc7pro
- acc8dev
- clipboard migration
- websvcpro migration (note: webdav access only see ClusterAcc8)
- updates frontend ramdisk (note: current points to el7 based ramdisk)
- zks
- asl335, asl735 (note: systems are unsupported)
- upgrade gitea
- upgrade foswiki
2022-03
cluster acc8dev is affected by a gfs2 bug (/common/usr and /common/scratch). Processes will get stuck in uninterruptible state (D). Bugfix deployed 2022-07-18
2021-10-25
Maintenance window closed. Pending tcl1000 will be completed in the next days.
Update, Patches, Firmware on all systems
Status
Completed
- acc7dev, acc7pro, acc7file complete
- acc8dev-rc1 complete
- oracle acc database complete
- archiving complete
- graylog complete
- acc6 complete
- acc7int/vmla complete
- www-acc complete
- scu ramdisk updated
- zks (thursday, 28.Oct)
- oracle gsi database
- symantec
- tcl1000
2021-08-24
networks are reconfigured.
- acc network is now 140.181.128.0/19 resp 255.255.224.0
- (most) devices are now in 10.248.0.1/19 resp 255.255.224.0
- (most) devices are now vlan 2700
If your system has the wrong ip, the easiest way is a full power cycle fully unplugging all power sources. This way also management modules (bmc, ipmi, ilo, drac, xport, ...) will pickup the changes.
2021-08-24
reconfiguration acc network. New netmask 140.181.128.0/19. Systems not picking up changes via dhcp require reconfiguration and reboot.
2021-08-17
reconfiguration embedded systems. New network 10.248.0.0/19. Reboot of all scu, supermicro, micrioc required.
2021-08-12
reconfiguration of all nwe4000 switches. Local network interruptions.
2021-08-10
changes to network connectivity core-it. Global network interruption.
2021-08-09
migrated git.acc.gsi.de to new server and new gitea version. complete.
2021-07-26
Maintenance window closed.
Status
- acc7dev, acc7pro, acc7file complete
- acc8dev-rc1 complete
- oracle database complete
- zks complete
- archiving complete
- graylog complete
- acc6 comple
- acc7int/vmla complete
- www-acc complete
- scu ramdisk updated
Some tasks are pending and will be completed in august
- git.acc.gsi.de
- tcl1000
- ...
2021-07-26
artifacts/nexus introduce new snapshot and release retention rules, see
ArtifactRepository#Retention
- delete all snapshots not requested for more than 180 days
- keep a maximum of 14 releases per group-artifact-id
2021-07-26
delete dns entry for bel.gsi.de
2021-01-19
- replace wsl007/wsl008 hardware
- downtime for wiki, bugzilla, webdav
2021-01-12
2020-11-11
completed.
Firmware updates on central acc switches. This will interrupt the complete acc network.
2020-11-09
Oracle patching competed
main maintenance complete. Oracle has some patches pending.
- maintenance week
- Update, Patches, Firmware all systems
completed
- acc7
- acc6
- logging/graylog
- archiving
- k8s
- ords
- winccoa
- ovirt hypervisors
- codimd
- id
- gitea
- jenkins master/slaves
- scu ramdisk refresh
- timingsystems tsl001/tsl019/tsl021/netdisco
- dal001/dal002
- wsl007/wsl008 = wiki, bugzilla
failed
- ovirt hosted-engine failed during upgrade. Recovery is in progress. Recovered
2020-11-02
- all remaining subversion repositories switched to readonly
2020-08-24
ten days
- replace acc6 dev with one virtual machine -> asl735
- replace acc6 pro with one virtual machine -> asl335
2020-08-10
Start upgrade of the ovirt cluster (vmlb).
During preparation virtual machines will be moved to different nodes (interruptions < 1sec).
During final migration machines have to be exported/imported. Service downtimes of 30minutes. Wiki will be updated with better estimates of migration dates.
affected machines and services: vmlb*, jenkins, git, codi, k8s, spice, fec-dns, etc.
upgrade is complete.
2020-07-13
- maintenance week fibre channel
- should be transparent. Note "should". We can't test it.
2020-06-29
- maintenance week ethernet
- update ethernet switches
- rolling reboots of all switches
2020-06-24
We need to replace some Hardware damaged by the power outage. Replacement will happen on short notice.
Downtimes for
- done. wsl007/wsl008 = www-acc.gsi.de
- done. asl330-asl334 = acc6pro
- gsi oracle database. migrated to 19c except cdb
- done. psl003-psl007 = wincc
2020-06-22
- two to five days
- Migration oracle gsi database to 19c
- migration oracle acc database to 19c
2020-06-15
- Maintenance week
- Update, Patches, Firmware all systems
- regular maintenance complete but see power outage notes
- status power outage greencube
- on monday 14:00 a power outage in the greencube happened
- asl730-asl734 failed - fixed. asl730 won't be repaired
- dbl005 failed - won't be repaired. We migrated to dbl2xx
- nwsr04 controller failed - repaired
2020-01-14
2019-08-12
- Maintenance week
- Removal of java8 and eclipse-neon
2019-07-30
2019-05-06
Maintenance window.
Operating system updates on all servers. Including asl74x, asl34x, fileservers, tcl1000, databases, interlock, etc.
Non userfacing systems will be updated before the maintenance window starts.
Java 11 will be rolled out.
Java 8 will be removed from all systems except acc7dev.
For anyone asking why systems are down or if systems are up again, the piggy bank is located at sb1.3.119
2019-04-16
Operating system updates including reboots and java 11 on acc7dev (asl740 to asl744) For curious people: that means 1072 software packages including eclipse
acc7dev has been updated.
2019-04-15
With the end of beamtime 2018/2019 the VMS systems (axp*, bel.gsi.de) and old domain controllers (dcw001/dcw002) will be decomissioned.
2019-03-19
git.acc.gsi.de is now in beta state. If nothing serious happens it will be declared productive post beamtime. See also
Git and
SshAgent
2018-10-15 to 2018-10-19
Maintenance window complete
2018-07-18 to 2018-07-19
Maintenance window complete.
2018-05-18
The new kernel did not help with the nfs problems. We reconfigured the nfs and restarted the systems.
2018-05-14
we still experience nfs problems. We try to patch the kernel. This requires a rolling restart of all acc7 servers.
2018-05-08
asl744 has been rebooted. Reason for nfs problems is unknown. Ticket with redhat is open.
system asl340 lost memory modules. System is currently unavailable. This means OxygenXML is not available
2018-04-10 to 2018-04-11
Maintenance window complete.
2018-03-06
OS upgrades on storage systems. This is transparent and no system should be affected.
2018-01-29 to 2018-02-02
Maintenance window complete
done - Oracle Bundle Patch (accdbp, accdbu, accdbt) on the server side. This affects all Database services including for example LSA. We will try to execute this with a rolling upgrade, node by node, keeping at least one node available. Database sessions require a reconnect on failover.
done - Upgrade Oracle Instantclient on el7 (acc7dev, acc7pro) to version 12.2
Meltdown patches on all user facing systems
- done - asl730-asl734 acc6dev
- done - asl330-asl334 acc6pro
- done - asl740-asl744 acc7dev
- done - asl340-asl344 acc7pro
- done - asl102 interlock
- done - asl103 fe monitor
- done - tsl001 timing
- done - dal001 dataacquisition
- done - dal002 dataacquisition
- done - psl003-psl005 wincc
- done partially -tcl1000-tcl10xx thinclient
- done - vml2x vmware based machines
- done - vmlax vmware based machines
- done - vmlbx ovirt based machines
- done - zkl001-zkl002 zks
- done - vml003 - vml004
- done - usl604-usl606
2017-12-04 to 2017-12-05
system maintenance
the acc7 chassis has some defects, requiring dissassembly. For this reason expect that we will need to complete maintenance window and core services like acc7 and nfs won't be available for a longer time.
2017-10-16 to 2017-10-20
Oracle Upgrade to 12c (accdbp, accdbu, accdbt)
2017-07-17 to 2017-07-21
Maintenance window complete.
system maintenance including NFS migration to new hardware. All user facing systems will be down. This includes
- acc6pro asl330-asl334
- acc6dev asl730-asl734
- acc6file asl430-asl432, fsl00c
- acc7pro asl340-asl344
- acc7dev asl740-asl744
- webservices websvpro, websvcdev, packages, olog, www-acc, www.acc, artifcats, builder, ...
- virtual machines
- vmware server
- zks
- ...
We will stop all user facing services and start with the migration of fsl00c, this includes the home directories. Once migration of NFS is complete will continue with general operating system upgrades. Next we will need to upgrade our storage system fabric.
2017-06-29
dns alias fcmw00a will be removed
2017-03-20 to 2017-03-21
system maintenance complete
2017-01-06 to 2017-01-20
Datacenter relocation.
All systems will be down.
Yes that is a time frame of two weeks. And yes you can expect issues once the systems are online again.
For anyone asking if systems are up again, the piggy bank is located at br2.2.152
Systems complete:
- gsi oracle databases
- gsi ords application servers (cdb)
- zks
- www-acc.gsi.de (wiki, subversion, bugzilla)
- acc6 (asl330-asl334, asl730-asl734)
- nfs server (fsl00c)
- acc7 (asl340-asl344, asl740-asl744)
- websites: websvcpro, webvscdev, packages, olog, ...
- vmware machines (vmla...)
- ovirt machines (jenkins, builder.acc.gsi.de)
- vms (axp, bel.gsi.de)
2016-11-28 network uplink
Change of ACC uplink Network unavailable from 7:00 to 7:30
2016-11-30 oracle migration
ACC Oracle Databases will be migrated to new storage systems. Databases will be unavailable.
2016-08-29 maintenance week
maintenance complete.
Known Issues:
- x-win32@acc6: x-win32 is incompatible with Redhat Enterise Linux 6.8.
If you need to connect to acc6 (asl73x, asl33x) please use a workaround
- use cygwin
- connect to acc7 using x-win32 and connect to acc6 via ssh and x forwarding.
-
a newer x-win32 version will fix this, but is not (yet) available at gsi
- new version of x-win32 available at softwarecenter
2016-06 jenkins
the buildserver
will should be migrated to new hardware end of
june?
2016-03-15 reset scuxl
all scus will be resetted
2016-03-14 maintenance week
maintenance complete.
Starting Monday 14. march to Friday 18.
Software updates and reboots of all services.
status 2016-03-15:
- acc6 done; asl73x, asl33x, asl43x
- webserver done, wsl00x
- logstash done; usl30x
- timing done; tsl001
- usl60x done
- vml00x done
- vml200x done
- wincc done; psl004
- anything zkl related
2016-03-07 fsl00t retirement
NFS Server fsl00t will be retired.
2016-01-25 maintenance week
Starting Monday 25. Jan to Friday 29
System maintenance complete
- fibre channel core switches relocated
- 12 servers moved
- over 150 cables layed
- over 50 servers updated
2016-01-04 subversion structure
subversion will be restructured.
The repository bel will be frozen.
Commit all changes before and prepare to create new workspaces.
See also
SubversionStructure
status 2016-01-04:
- new repositories created
- bel renamed to bel-archiv
- bel-archiv is read only
- some repository data migrated (including history)
- please migrate other data (excluding history) using
svn export
and svn import
2015-12-07 maintenance week
Starting Monday 07. Dec to Friday 11
System maintenance. Expect major service interruptions.
Status:
- wsl00x done -> www-acc, wiki, subversion
- asl43x done -> NFS, home,
- asl33x done -> acc6-PRO, artifacts
- asl73x done -> lsa server
- usl30x done -> logstash
- asl102 done -> interlock
- usl602 done -> buildserver
- usl603 done -> virtual scu
- tsl001 done -> timing
- zkl00x done -> zks
- vml00x done -> fesa build, rpmbuild, etc
pending:
- zks terminals reboot (updates are done)
maintenance week closed.
2015-11-30 webserver migration
Starting 10:00, expecting 4h of downtime.
migration of www-acc.gsi.de and www.acc.gsi.de to new hardware.
failure of wiki, subversion, bugzilla and other webservices using these domains.
Migration is complete. https certificate and kerberos tickets are working.
2015-07-23 Upgrade Artifacts
Upgrade artifacts.acc.gsi.de. Starting 10:00 expecting 2h downtime
2015-05-04
Starting Monday 04. May to Friday 08
System maintenance. Expect major service interruptions.
System updates. Network Firmware updates. Will shutdown acc6 and acc5 cluster for a few hours.
Status update 2015-05-04: network and storage switches are patched, acc5 and acc6 are patched. Database systems will be patched on Tuesday
Status update 2015-05-05: maintenance complete
2015-04-07
Default java will switch to java 8. Typing "java" will result in a java 8 runtime.
Default eclipse version will be luna. Start with "eclipse-luna" The alias "eclipse", currently pointing to kepler, will be removed
2015-03-16
Starting Monday 16. Mar to Friday 20
System maintenance. Expect major service interruptions.
Depending on completion of electric power installation, the current plans include physical movement of storage systems and bladecenters to new racks.
Interruption will include acc5 (asl72x) acc6 (asl73x), webservers (wsl00x), fileservers (fsl00c, fsl00t), network boot, oracle databases (acc and gsi), zks, ...
Status update 2015-03-18:
Blade enclosures and storage systems have been moved. Most user facing services/machines are still powered down. We expect to restore main services (acc5 and acc6) on Thursday.
Status update 2015-03-19:
- acc6 is up and running
- acc5 has a hardware defect, expected to be running friday
- other services (jenkins, logstash) are powered off
Status update 2015-03-20:
- acc6 is up and running
- acc5 is up and running
- other services (jenkins, logstash) are powered off
2015-01-12
Starting Monday 12. Jan to Friday 16. System maintenance. Expect major service failures.
Current plans include physical movement of servers and blade centers. Expect hours/days of downtime for linux cluster acc5, acc6, NFS servers fsl00t, fsl00c, tftp services, etc.
Status update 2015-01-13: Software updates are mostly complete. Waiting for electric installation to finish before moving servers.
Status update 2015-01-19: Electric installation not completed. New maintenance window in march
2014-11-11
all el6 systems received an subversion upgrade to 1.7. For details see
Subversion
2014-10-23
For security reasons caused by the
ssl poodle bug, SSLv3 has been deactivated on our webservers. This affects the subversion connection of eclipse-indigo on the acc5 cluster. Subversion access on acc5 is only possible using the command line client.
2104-08-06
new java version 1.7.0_67 on acc6.
Check your eclipse settings.
This update fixes security issues and a java webstart bug.
Starting with this release eclipse settings should be stable during upgrades.
2014-07-24
acc6-file had a failure from 16:40 to 16:50. This affected for example all home filesystems on acc6-pro and acc6-dev.
One of the cluster protocols was lacking redundancy. Configuration changes to solve this issue froze a cluster service and a reboot was required. Sorry for the interruption. Reboot of all acc6 clusters (file, pro, dev) is complete.
2014-05-26
System maintenance complete
2014-05-21
2014-05-21, 2014-05-22, 2014-05-23, 2014-05-26 system maintenance. Expect service failures.
date changed maintenance now includes friday
2014-05-13
new java version 1.7.0_55 on acc6. Check your eclipse settings
2014-05-08
webserver certificates for www-acc.gsi.de and www.acc.gsi.de have been refreshed. To update subversions cache use
svn info https://www-acc.gsi.de/svn/bel
2014-03-26
new java version 1.7.0_51. Check your eclipse settings.
2013-11-25 System updates.
2013-11-25 from 08:00 to estimated 2013-11-28 18:00
Affected: all machines and services
2013-08-26
2013-08-26 from 08:00 to estimated 2013-08-26 14:00
Affected: Blade chassis blc292 with the hosts asl73x, psl00x
Modification of enclosure network uplink.
2013-07-08
Systemaktualisierung.
Zeitraum 2013-07-08 ab 08:00 bis vorraussichtlich 2013-07-10 16:00
Betroffen sind von IN betreuten Server. Komplettausfall von acc6, webserver und datenbanken. Service unterbrechungen von allen weiteren IN Diensten. Cluster (acc5, acc6), Webserver (subversion, wiki), Buildsystem (maven repository, jenkins).
2013-05-22
Migration auf JDK7 auf allen Systemen.
2013-04-08
Systemaktualisierung.
Zeitraum 2013-04-08 ab 08:00 bis vorraussichtlich 2013-04-10 16:00
Betroffen sind von IN betreuten Server. Komplettausfall von acc6, webserver und datenbanken. Service unterbrechungen von allen weiteren IN Diensten. Cluster (acc5, acc6), Webserver (subversion, wiki), Buildsystem (maven repository, jenkins).
Update 2013-04-08 14:00:
Ein Firmware upgrade im SAN ist fehlgeschlagen, wir nehmen fuer heute erstmal alle Dienste hoch und werden uns morgen nochmal mit dem upgrade beschaeftigen. Mit erneuten Ausfaellen ist zu rechnen.
2013-01-14
Systemaktualisierung.
Zeitraum 2013-01-14 ab 08:00 bis vorraussichtlich 2013-01-16 16:00
Betroffen sind von IN betreuten Server. Die Cluster (acc5, acc6), Webserver (subversion, wiki), Buildsystem (maven repository, jenkins). Sowie der Netzwerkbetrieb.
2013-01-07
Aufgrund von Wartungsarbeiten an der Klimaanlage Abschaltung aller IN Server.
Zeitraum 2013-01-07 ab 16:00 bis vorraussichtlich 2013-01-08 16:00
Betroffen sind alle von IN betreuten Server. Alle Cluster (axp, acc5, acc6), Webserver (subversion, wiki), Buildsystem (maven repository, jenkins)
Es wird versucht den Netzwerkbetrieb aufrecht zu erhalten.
2012-12-11
Aufgrund von Wartungsarbeiten an der Klimaanlage Abschaltung aller IN Server.
Zeitraum 2012-12-11 ab 16:00 bis vorraussichtlich 2012-12-12 16:00
Betroffen sind alle von IN betreuten Server. Alle Cluster (axp, acc5, acc6), Webserver (subversion, wiki), Buildsystem (maven repository, jenkins)
Es wird versucht den Netzwerkbetrieb aufrecht zu erhalten.