[Dockdev] Re: [Dock-fans] Running Dock on paralel

Terry Lang terry at cgl.ucsf.edu
Fri Sep 16 10:04:14 PDT 2005


Dear Ricardo,

          We currently are aware of one bug for the MPI code.  In the 
current setup, the code uses the master node for I/O only and 
distributes the actual calculations to the slave nodes.  If you only 
have one node and are using the MPI, only I/O is performed with no 
docking.  In addition to the inefficiencies of using # nodes-1 for the 
calculations, we have documented that other MPI libraries seem to seg 
fault as a result of this issue.  We are working on this bug and will 
post a fix as soon as we have solved the problem.  I am not familiar 
with the OpenMosix solution, but it looks from your error messages that 
this may be at least one of your problems.

Sincerely,
Terry

Ricardo Nicoluci wrote:

> Hi,
>
> I just installed Dock to run on a cluster. But instead of using the 
> usual MPI configuration, I took the advantage of our cluster has an 
> OpenMosix solution and instead of install the MPI on all cluster nodes 
> or share the MPI installation folder trough NFS and also the data 
> among them, with the openmosix solution I just start all the eight 
> process into the node1 a let the it manage to me. I like it pretty 
> much because it has an auto-balacing control, so it tends to use all 
> the machines equaly, even considering machines that are better the 
> others. However, sometimes that I run Dock 5.2 on paralel it gives the 
> following error:
>
> p3_1766:  p4_error: net_recv read:  probable EOF on socket: 1
> p5_1808:  p4_error: net_recv read:  probable EOF on socket: 1
> p8_1894:  p4_error: net_recv read:  probable EOF on socket: 1
> bm_list_1718: (9296.991802) wakeup_slave: unable to interrupt slave 0 
> pid 1717
> bm_list_1718: (9296.992147) wakeup_slave: unable to interrupt slave 0 
> pid 1717
> rm_l_1_1735: (9294.355503) net_send: could not write to fd=6, errno = 9
> rm_l_1_1735:  p4_error: net_send write: -1
>     p4_error: latest msg from perror: Bad file descriptor
> rm_l_2_1753: (9293.837317) net_send: could not write to fd=6, errno = 9
> rm_l_2_1753:  p4_error: net_send write: -1
>     p4_error: latest msg from perror: Bad file descriptor
> rm_l_4_1794: (9290.788141) net_send: could not write to fd=6, errno = 9
> rm_l_4_1794:  p4_error: net_send write: -1
>     p4_error: latest msg from perror: Bad file descriptor
> rm_l_6_1838: (9285.729331) net_send: could not write to fd=6, errno = 9
> rm_l_6_1838:  p4_error: net_send write: -1
>     p4_error: latest msg from perror: Bad file descriptor
> rm_l_7_1861: (9283.199900) net_send: could not write to fd=6, errno = 9
> rm_l_7_1861:  p4_error: net_send write: -1
>     p4_error: latest msg from perror: Bad file descriptor
> could not get processgroup for 2069
>
> It seems to be a problem between a slave process conecting to the 
> master process of the MPI paralelization. I know the the OpenMosix 
> maybe causing this error, but googling it I found that this error may 
> occur even on pure MPI system. Did someone ever get it ? Any 
> information from the dev people ?
>
> Thanks,
>
>
> -- 
> Ricardo de Paula Nicoluci
> PhD Student
> Medicinal Chemistry Laboratory
> IFSC - University of Sao Paulo
> Sao Carlos - SP - Brazil
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Dock-fans mailing list
>Dock-fans at docking.org
>http://blur.compbio.ucsf.edu/mailman/listinfo/dock-fans
>  
>

-- 
P. Therese Lang
Kuntz and James Labs
UCSF--Chemistry and Chemical Biology
Phone: (415)476-3986

 




More information about the Dockdev mailing list