1. Welcome Guest! In order to create a new topic or reply to an existing one, you must register first. It is easy and free. Click here to sign up now!.
    Dismiss Notice

Windows 2003 EE DC and Domino Server hanging weekly

Discussion in 'Windows Home Server' started by Tiago Lock Martins, May 6, 2009.

  1. Hello,

    We are experiencing a server hang always between 9 to 10AM on Mondays.
    Already done some actions to identify root cause but without success.

    Let me explain our issue first :

    For the last 1 and a half month, our server hangs. Terminal Services,
    file sharing, IBM Domino, Remote console (ILO2 - HP Server) and even
    physical console become unavailable, but still responds to ICMP. These hang
    events sometimes needs the server to be restarted from ILO2 or power button,
    in most cases it simply hangs for 5 to 15 minutes and get back to work
    normally without we doing anything.

    __________________________________________________________
    This is the environment :

    HP Proliant DL380 G5
    2 3GHz Xeon dual core processors
    16GB physical Memory
    8x146GB SAS HDs on a hardware RAID 5
    HP Smart Array P400 RAID controller
    LSI Adapter Ultra 320 SCSI (using StorPort) - used as interface to a
    Autoloader 8xUltrium2 Tape Device
    QLogic Fibre Channel Adapter QL2300 1x1GB Fibre Channel interface to a
    Dell EMC² CX300 Storage
    2x1GB onboard NC373 ethernet adapters - one to corp network, other
    direct attached using cat5e crossover cable to another identical server (not
    in use)

    Windows 2003 SP2 not R2 EE x86
    Not using /3GB and /PAE - early we had /PAE, but IBM Domino keep warning
    on low resources, then after removing /PAE it settled down
    Services running : DC, DNS, DHCP, DFS domain Root and Local, IBM Domino
    Server (e-mail Server), some folder sharing needed on behalf our Domino
    Structure actually the user IDs are in theses shares to be mapped by logon
    scripting)
    Other softwares : LTAuditor, Password Filtering, VirusScan Enterprise
    8.5 patch 8
    Disc and volume configuration:
    8 local disks on RAID 5 - SAS: E: (local data/softare); P: (4GB of
    Pagefile) and C: (System)
    3 LUNs on Storage - 1 FibreChannel - F: (Domino), 2 SATA - S:
    (Shadow Copy) and e:\usuarios (mount point holding the 607 shares of user
    IDs)

    __________________________________________________________
    Already done these actions:

    Changed pagefile from drive P: to C: maintaining a minimum and maximum
    size of 4096MB - No results
    Scheduled Tasks reviewed on local and other related servers - Nothing
    running at 9 to 10AM Mondays
    Reviewed /PAE parameter documentation - Nothing that could do this
    impact found we will add this parameter on next week to test
    Recheck shares and permissions on user IDs folders - Nothing wrong was
    found
    No abnormal antivirus behavior
    Disabled LTAuditor and folder/file auditing
    Reviewed IBM Lotus Domino schedules, procedures, functions and logs -
    nothing strange
    No abnormal entries on event viewer
    Disabled SCOM agent
    Reviewed backup schedules - Nothing running at 9 to 10AM Mondays
    Reviewed all environment changes in last 3 months and nothing that could
    be related
    Found KB 941276 talking about StorPort driver but I'm not sure if it is
    really applicable to our issue/configuration
    Found KB 244139 and configured the manual memory dump, but could make
    the dump cause keyboard stops responding on hang

    __________________________________________________________

    Any clue or direction to take will be greatly appreciated.

    Thanks for your time.

    ps.: Sorry by my average english :)
     
  2. Hi,

    Thank you for posting here.

    According to your description, I understand that:

    Your DC and Domino Server experience regular hangs on Mondays.

    If I have misunderstood the problem, please don't hesitate to let me know.

    Based on the symptom, I suspect this issue may be caused by bottleneck and
    heavy load caused by when all users come back to Office and access the
    server.

    I suggest monitoring the number of the clients that accessing the server on
    Monday. If possible, try to restrict the number of clients to check if
    there is any progress.

    If the issue persists when a little number of clients accessing the server,
    we have to analyze system performances counters or analyze memory dump
    files. Newsgroup is not the best place to analyze those dump files, it's
    suggested to contact Microsoft Customer Support Services (CSS) so that a
    dedicated Support Professional can assist with this request. Thank you for
    your understanding.

    To obtain the phone numbers for specific technology request please take a
    look at the web site listed below.
    http://support.microsoft.com/default.aspx?scid=fh;EN-US;PHONENUMBERS

    If you are outside the US please see http://support.microsoft.com for
    regional support phone numbers.

    Sincerely,
    Mervyn Zhang
    Microsoft Online Community Support

    ==================================================
    This posting is provided "AS IS" with no warranties, and confers no rights.
     
  3. As it happens regularly, you could run Sysinternals Process Monitor at that
    time to see what is going on,
    Anthony
    http://www.airdesk.com


    "Tiago Lock Martins" <TLock@community.nospam> wrote in message
    news:OznkgEnzJHA.5728@TK2MSFTNGP03.phx.gbl...
    > Hello,
    >
    > We are experiencing a server hang always between 9 to 10AM on Mondays.
    > Already done some actions to identify root cause but without success.
    >
    > Let me explain our issue first :
    >
    > For the last 1 and a half month, our server hangs. Terminal Services,
    > file sharing, IBM Domino, Remote console (ILO2 - HP Server) and even
    > physical console become unavailable, but still responds to ICMP. These
    > hang events sometimes needs the server to be restarted from ILO2 or power
    > button, in most cases it simply hangs for 5 to 15 minutes and get back to
    > work normally without we doing anything.
    >
    > __________________________________________________________
    > This is the environment :
    >
    > HP Proliant DL380 G5
    > 2 3GHz Xeon dual core processors
    > 16GB physical Memory
    > 8x146GB SAS HDs on a hardware RAID 5
    > HP Smart Array P400 RAID controller
    > LSI Adapter Ultra 320 SCSI (using StorPort) - used as interface to a
    > Autoloader 8xUltrium2 Tape Device
    > QLogic Fibre Channel Adapter QL2300 1x1GB Fibre Channel interface to a
    > Dell EMC² CX300 Storage
    > 2x1GB onboard NC373 ethernet adapters - one to corp network, other
    > direct attached using cat5e crossover cable to another identical server
    > (not in use)
    >
    > Windows 2003 SP2 not R2 EE x86
    > Not using /3GB and /PAE - early we had /PAE, but IBM Domino keep
    > warning on low resources, then after removing /PAE it settled down
    > Services running : DC, DNS, DHCP, DFS domain Root and Local, IBM Domino
    > Server (e-mail Server), some folder sharing needed on behalf our Domino
    > Structure actually the user IDs are in theses shares to be mapped by
    > logon scripting)
    > Other softwares : LTAuditor, Password Filtering, VirusScan Enterprise
    > 8.5 patch 8
    > Disc and volume configuration:
    > 8 local disks on RAID 5 - SAS: E: (local data/softare); P: (4GB of
    > Pagefile) and C: (System)
    > 3 LUNs on Storage - 1 FibreChannel - F: (Domino), 2 SATA - S:
    > (Shadow Copy) and e:\usuarios (mount point holding the 607 shares of user
    > IDs)
    >
    > __________________________________________________________
    > Already done these actions:
    >
    > Changed pagefile from drive P: to C: maintaining a minimum and maximum
    > size of 4096MB - No results
    > Scheduled Tasks reviewed on local and other related servers - Nothing
    > running at 9 to 10AM Mondays
    > Reviewed /PAE parameter documentation - Nothing that could do this
    > impact found we will add this parameter on next week to test
    > Recheck shares and permissions on user IDs folders - Nothing wrong was
    > found
    > No abnormal antivirus behavior
    > Disabled LTAuditor and folder/file auditing
    > Reviewed IBM Lotus Domino schedules, procedures, functions and logs -
    > nothing strange
    > No abnormal entries on event viewer
    > Disabled SCOM agent
    > Reviewed backup schedules - Nothing running at 9 to 10AM Mondays
    > Reviewed all environment changes in last 3 months and nothing that
    > could be related
    > Found KB 941276 talking about StorPort driver but I'm not sure if it is
    > really applicable to our issue/configuration
    > Found KB 244139 and configured the manual memory dump, but could make
    > the dump cause keyboard stops responding on hang
    >
    > __________________________________________________________
    >
    > Any clue or direction to take will be greatly appreciated.
    >
    > Thanks for your time.
    >
    > ps.: Sorry by my average english :)
    >
    >
    >
     
  4. I don't have sure if will get something cause even perfmon stops monitoring
    during hang.

    Will try it.

    Thanks a lot for your time.

    "Anthony [MVP]" <anthony@no-reply.com> escreveu na mensagem
    news:O3zGncwzJHA.2584@TK2MSFTNGP05.phx.gbl...
    > As it happens regularly, you could run Sysinternals Process Monitor at
    > that time to see what is going on,
    > Anthony
    > http://www.airdesk.com
    >
    >
    > "Tiago Lock Martins" <TLock@community.nospam> wrote in message
    > news:OznkgEnzJHA.5728@TK2MSFTNGP03.phx.gbl...
    >> Hello,
    >>
    >> We are experiencing a server hang always between 9 to 10AM on Mondays.
    >> Already done some actions to identify root cause but without success.
    >>
    >> Let me explain our issue first :
    >>
    >> For the last 1 and a half month, our server hangs. Terminal Services,
    >> file sharing, IBM Domino, Remote console (ILO2 - HP Server) and even
    >> physical console become unavailable, but still responds to ICMP. These
    >> hang events sometimes needs the server to be restarted from ILO2 or power
    >> button, in most cases it simply hangs for 5 to 15 minutes and get back to
    >> work normally without we doing anything.
    >>
    >> __________________________________________________________
    >> This is the environment :
    >>
    >> HP Proliant DL380 G5
    >> 2 3GHz Xeon dual core processors
    >> 16GB physical Memory
    >> 8x146GB SAS HDs on a hardware RAID 5
    >> HP Smart Array P400 RAID controller
    >> LSI Adapter Ultra 320 SCSI (using StorPort) - used as interface to a
    >> Autoloader 8xUltrium2 Tape Device
    >> QLogic Fibre Channel Adapter QL2300 1x1GB Fibre Channel interface to a
    >> Dell EMC² CX300 Storage
    >> 2x1GB onboard NC373 ethernet adapters - one to corp network, other
    >> direct attached using cat5e crossover cable to another identical server
    >> (not in use)
    >>
    >> Windows 2003 SP2 not R2 EE x86
    >> Not using /3GB and /PAE - early we had /PAE, but IBM Domino keep
    >> warning on low resources, then after removing /PAE it settled down
    >> Services running : DC, DNS, DHCP, DFS domain Root and Local, IBM
    >> Domino Server (e-mail Server), some folder sharing needed on behalf our
    >> Domino Structure actually the user IDs are in theses shares to be mapped
    >> by logon scripting)
    >> Other softwares : LTAuditor, Password Filtering, VirusScan Enterprise
    >> 8.5 patch 8
    >> Disc and volume configuration:
    >> 8 local disks on RAID 5 - SAS: E: (local data/softare); P: (4GB of
    >> Pagefile) and C: (System)
    >> 3 LUNs on Storage - 1 FibreChannel - F: (Domino), 2 SATA - S:
    >> (Shadow Copy) and e:\usuarios (mount point holding the 607 shares of user
    >> IDs)
    >>
    >> __________________________________________________________
    >> Already done these actions:
    >>
    >> Changed pagefile from drive P: to C: maintaining a minimum and maximum
    >> size of 4096MB - No results
    >> Scheduled Tasks reviewed on local and other related servers - Nothing
    >> running at 9 to 10AM Mondays
    >> Reviewed /PAE parameter documentation - Nothing that could do this
    >> impact found we will add this parameter on next week to test
    >> Recheck shares and permissions on user IDs folders - Nothing wrong was
    >> found
    >> No abnormal antivirus behavior
    >> Disabled LTAuditor and folder/file auditing
    >> Reviewed IBM Lotus Domino schedules, procedures, functions and logs -
    >> nothing strange
    >> No abnormal entries on event viewer
    >> Disabled SCOM agent
    >> Reviewed backup schedules - Nothing running at 9 to 10AM Mondays
    >> Reviewed all environment changes in last 3 months and nothing that
    >> could be related
    >> Found KB 941276 talking about StorPort driver but I'm not sure if it
    >> is really applicable to our issue/configuration
    >> Found KB 244139 and configured the manual memory dump, but could make
    >> the dump cause keyboard stops responding on hang
    >>
    >> __________________________________________________________
    >>
    >> Any clue or direction to take will be greatly appreciated.
    >>
    >> Thanks for your time.
    >>
    >> ps.: Sorry by my average english :)
    >>
    >>
    >>
     
  5. Hello,

    We don't have bottleneck and heavy load caused by when all users connect to
    this server/services, cause we have significant less users than before (you
    know, worldwide economical situation).

    We cannot make the memory dump while it hangs cause even the direct attached
    keyboard, mouse and monitor doesn't seems to work.

    Another idea ?

    Thanks for your time

    "Mervyn Zhang [MSFT]" <v-mervzh@online.microsoft.com> escreveu na mensagem
    news:CbA5zOszJHA.3568@TK2MSFTNGHUB02.phx.gbl...
    > Hi,
    >
    > Thank you for posting here.
    >
    > According to your description, I understand that:
    >
    > Your DC and Domino Server experience regular hangs on Mondays.
    >
    > If I have misunderstood the problem, please don't hesitate to let me know.
    >
    > Based on the symptom, I suspect this issue may be caused by bottleneck and
    > heavy load caused by when all users come back to Office and access the
    > server.
    >
    > I suggest monitoring the number of the clients that accessing the server
    > on
    > Monday. If possible, try to restrict the number of clients to check if
    > there is any progress.
    >
    > If the issue persists when a little number of clients accessing the
    > server,
    > we have to analyze system performances counters or analyze memory dump
    > files. Newsgroup is not the best place to analyze those dump files, it's
    > suggested to contact Microsoft Customer Support Services (CSS) so that a
    > dedicated Support Professional can assist with this request. Thank you for
    > your understanding.
    >
    > To obtain the phone numbers for specific technology request please take a
    > look at the web site listed below.
    > http://support.microsoft.com/default.aspx?scid=fh;EN-US;PHONENUMBERS
    >
    > If you are outside the US please see http://support.microsoft.com for
    > regional support phone numbers.
    >
    > Sincerely,
    > Mervyn Zhang
    > Microsoft Online Community Support
    >
    > ==================================================
    > This posting is provided "AS IS" with no warranties, and confers no
    > rights.
    >
     
  6. Can 16 GB of memory and a 4MB swap file be a problem.
    Is the 4MB a typo for 6GB?

    Rich W.

    Tiago Lock Martins wrote:
    > __________________________________________________________
    > This is the environment :
    >
    > HP Proliant DL380 G5
    > 2 3GHz Xeon dual core processors
    > 16GB physical Memory
    > 8x146GB SAS HDs on a hardware RAID 5

    (some sniped)
    > __________________________________________________________
    > Already done these actions:
    >
    > Changed pagefile from drive P: to C: maintaining a minimum and
    > maximum size of 4096MB - No results
     
  7. It has 16GB os physycal memory, but SO isin't using it cause the lack os
    /PAE parameter.
    The pagefile is set to 4GB.


    "Rich Wonneberger" <turtil@frontiernet.net> escreveu na mensagem
    news:%23f$wieE0JHA.6004@TK2MSFTNGP02.phx.gbl...
    > Can 16 GB of memory and a 4MB swap file be a problem.
    > Is the 4MB a typo for 6GB?
    >
    > Rich W.
    >
    > Tiago Lock Martins wrote:
    >> __________________________________________________________
    >> This is the environment :
    >>
    >> HP Proliant DL380 G5
    >> 2 3GHz Xeon dual core processors
    >> 16GB physical Memory
    >> 8x146GB SAS HDs on a hardware RAID 5

    > (some sniped)
    >> __________________________________________________________
    >> Already done these actions:
    >>
    >> Changed pagefile from drive P: to C: maintaining a minimum and maximum
    >> size of 4096MB - No results
     
  8. Hi,

    Could you make any process with Process Monitor or any other tool?

    If not, it's suggested to contact Microsoft CSS since they are the best
    resource for this kind of issue. Based on current situation, general
    suggestions here is not enough for you to find the root cause.

    If you would like to, please help to collect the following information, I
    will try to analyze them and give some suggestions.

    A. Download MPS Reporting Tool (MPSRPT_PFE.EXE) from the following link:
    (http://www.microsoft.com/downloads/details.aspx?FamilyID=00ad0eac-720f-4441
    -9ef6-ea9f657b5c2f&DisplayLang=en)

    Please note: The link may be truncated when you read the E-mail. Be sure to
    include all text between '(' and ')' when navigating to the download
    location.

    B . Right click MPSRPT_PFE.EXE and select Run as Administrator to run this
    tool, and you will see a Command Window start up.

    C . Please type Y with the message of <Include the MSINFO32 report?
    (defaults to Y in 15 seconds)[Y,N]?

    D . When the tool is done you will see an Explorer Window opening up the
    %systemroot%\MPSReports\Setup\Reports\cab folder and containing a
    <Computername>MPSReports.cab file. After collecting, please use Windows
    Live SkyDrive (http://www.skydrive.live.com/) to upload the file and then
    give me the download address.

    Sincerely,
    Mervyn Zhang
    Microsoft Online Community Support

    ==================================================
    This posting is provided "AS IS" with no warranties, and confers no rights.
     
  9. Ah, when you said it had 16 Gb RAM I assumed that meant you were using
    Enterprise Edition

    "Tiago Lock Martins" <TLock@community.nospam> wrote in message
    news:#$0Bz8w0JHA.4412@TK2MSFTNGP06.phx.gbl...
    > It has 16GB os physycal memory, but SO isin't using it cause the lack os
    > /PAE parameter.
    > The pagefile is set to 4GB.
    >
    >
    > "Rich Wonneberger" <turtil@frontiernet.net> escreveu na mensagem
    > news:%23f$wieE0JHA.6004@TK2MSFTNGP02.phx.gbl...
    >> Can 16 GB of memory and a 4MB swap file be a problem.
    >> Is the 4MB a typo for 6GB?
    >>
    >> Rich W.
    >>
    >> Tiago Lock Martins wrote:
    >>> __________________________________________________________
    >>> This is the environment :
    >>>
    >>> HP Proliant DL380 G5
    >>> 2 3GHz Xeon dual core processors
    >>> 16GB physical Memory
    >>> 8x146GB SAS HDs on a hardware RAID 5

    >> (some sniped)
    >>> __________________________________________________________
    >>> Already done these actions:
    >>>
    >>> Changed pagefile from drive P: to C: maintaining a minimum and
    >>> maximum size of 4096MB - No results

    >
     
  10. Hi,

    Is there any new troubleshooting information available? If there is
    anything we can do for you, please let us know.

    Sincerely,
    Mervyn Zhang
    Microsoft Online Community Support

    ==================================================
    This posting is provided "AS IS" with no warranties, and confers no rights.
     

Share This Page