[Dockdev] Re: [Dock-fans] Running Dock on paralel
Terry Lang
terry at cgl.ucsf.edu
Fri Sep 16 10:04:14 PDT 2005
Dear Ricardo,
We currently are aware of one bug for the MPI code. In the
current setup, the code uses the master node for I/O only and
distributes the actual calculations to the slave nodes. If you only
have one node and are using the MPI, only I/O is performed with no
docking. In addition to the inefficiencies of using # nodes-1 for the
calculations, we have documented that other MPI libraries seem to seg
fault as a result of this issue. We are working on this bug and will
post a fix as soon as we have solved the problem. I am not familiar
with the OpenMosix solution, but it looks from your error messages that
this may be at least one of your problems.
Sincerely,
Terry
Ricardo Nicoluci wrote:
> Hi,
>
> I just installed Dock to run on a cluster. But instead of using the
> usual MPI configuration, I took the advantage of our cluster has an
> OpenMosix solution and instead of install the MPI on all cluster nodes
> or share the MPI installation folder trough NFS and also the data
> among them, with the openmosix solution I just start all the eight
> process into the node1 a let the it manage to me. I like it pretty
> much because it has an auto-balacing control, so it tends to use all
> the machines equaly, even considering machines that are better the
> others. However, sometimes that I run Dock 5.2 on paralel it gives the
> following error:
>
> p3_1766: p4_error: net_recv read: probable EOF on socket: 1
> p5_1808: p4_error: net_recv read: probable EOF on socket: 1
> p8_1894: p4_error: net_recv read: probable EOF on socket: 1
> bm_list_1718: (9296.991802) wakeup_slave: unable to interrupt slave 0
> pid 1717
> bm_list_1718: (9296.992147) wakeup_slave: unable to interrupt slave 0
> pid 1717
> rm_l_1_1735: (9294.355503) net_send: could not write to fd=6, errno = 9
> rm_l_1_1735: p4_error: net_send write: -1
> p4_error: latest msg from perror: Bad file descriptor
> rm_l_2_1753: (9293.837317) net_send: could not write to fd=6, errno = 9
> rm_l_2_1753: p4_error: net_send write: -1
> p4_error: latest msg from perror: Bad file descriptor
> rm_l_4_1794: (9290.788141) net_send: could not write to fd=6, errno = 9
> rm_l_4_1794: p4_error: net_send write: -1
> p4_error: latest msg from perror: Bad file descriptor
> rm_l_6_1838: (9285.729331) net_send: could not write to fd=6, errno = 9
> rm_l_6_1838: p4_error: net_send write: -1
> p4_error: latest msg from perror: Bad file descriptor
> rm_l_7_1861: (9283.199900) net_send: could not write to fd=6, errno = 9
> rm_l_7_1861: p4_error: net_send write: -1
> p4_error: latest msg from perror: Bad file descriptor
> could not get processgroup for 2069
>
> It seems to be a problem between a slave process conecting to the
> master process of the MPI paralelization. I know the the OpenMosix
> maybe causing this error, but googling it I found that this error may
> occur even on pure MPI system. Did someone ever get it ? Any
> information from the dev people ?
>
> Thanks,
>
>
> --
> Ricardo de Paula Nicoluci
> PhD Student
> Medicinal Chemistry Laboratory
> IFSC - University of Sao Paulo
> Sao Carlos - SP - Brazil
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Dock-fans mailing list
>Dock-fans at docking.org
>http://blur.compbio.ucsf.edu/mailman/listinfo/dock-fans
>
>
--
P. Therese Lang
Kuntz and James Labs
UCSF--Chemistry and Chemical Biology
Phone: (415)476-3986
More information about the Dockdev
mailing list