H++ (web-based computational prediction of protonation states and pK of ionizable groups in macromolecules)

Q: Who can use H++?
A: Unlimited free access to this site is restricted to non-profit use (including academic and educational). If you intend to use it for any other purpose, please contact the development team to obtain proper license. (Regardless of the purpose, you can test run the site for up to three weeks. But please avoid submitting very large structures. )

Q: Do I need to log in?
A: No, it is not necessary to log in, you can process (small) structures anonymously.

Q: Why would I want to register?
A: Registering allows you to keep your results for more than a single session. This will also allow you to have access to more advanced features, such as being able to process large structures.

Q: How do I register?
A: Click on the "Register" button on the left hand side of the screen, to the right of the "Login" button or go to: The Registration Page

Q: Do I need to keep the browser open once the calculation has started?
A: If you are a registered user you can log out and re-login later to check your results (or to see the error/warning messages). If you use H++ anonymously, you can not access your results or error messages once you log off.

Q: How can I obtain the source code for H++?
A: Please contact the development team. Notice, however, that installation of the suite is fairly not-trivial; several additional codes will have to be installed as well. We do not recommend attempting it without good sys admin skills. Testing is a long process and is not automated: if the code "runs" on a few structures it does not mean it "works". We have not tested H++ on any platform other than what it currently runs on. Also note that our team has no resources to provide any support beyond the web version that we are maintaining.

Q: What is the methodology behind H++?
A: The approach is based on classical continuum electrostatics and basic statistical mechanics. As such, it contains several approximations to reality. The advantage of the approach is its rigorous basis (within the above framework): it does not contain heuristic fits to massive data sets or empirical approximations. One should keep in mind, however, that this by no means translates into perfect agreement with experiment when applied to real biomolecules. See a brief discussion below about the accuracy.

Q: What input parameters should I choose?
A: For typical physiological conditions, using the default value of 80 for the external dielectric (water) and salinity = 0.15 M is reasonable. The situation is less straightforward with the internal dielectric. If you are mostly interested in deeply buried residues, a lower value is recommended such as 4. If your focus is residues closer to the surface, a higher value is appropriate, such as 10 or even 20. See a few suggestions (two Qs down) on how these ideas can be used to improve the accuracy of your calculations.

Q: How fast is the current version of H++ server?
A: The actual processing time will vary depending on the load on the H++ server. The following examples provides a rough estimate of the completion time of the current version. The completion time for a molecule of 12 titratable sites such as 1vii is approximately 18 sec, 5 minutes for a molecules of 111 titratable sites (1AD2), and 45 minutes for a large molecule of 360 titratable sites (1KX5).

Q: Is pK_(1/2) reported by H++ the same as pKa?
A: Generally yes, but not always. By definition, pK_(1/2) is the mid-point of a titration curve. In the majority of cases the latter is well approximated by the classical sigmoidal (Henderson-Hasselbalch ) shape, in which case pK_(1/2) = pKa. If, however, the titration curve deviates strongly from the classical Henderson-Hasselbalch sigmoidal shape, pKa is no longer a good approximation for pK_(1/2). More on this problem can be found in this publication .

Q: What is pK int ?
A: This is an hypothetical pK of a group assuming that it does not interact with any other titratable group in the protein. The concept is sometimes useful for analysis of the calculation, but in most cases you don't have to worry about it. For more details refer to pKa of Ionizable Groups in Proteins: Atomic Detail from a Continuum Electrostatic Model. by D. Bashford and M. Karplus; Biochemistry, 29 10219--10225, 1990.

Q: What is the accuracy (compared to experiment) of the pK values computed by this server?
A: The answer may not be as straightforward as one may want it to be. Generally, the single-structure continuum solvent methodology used here is believed to have an error margin of an order of 1 pK unit, on average . However, the deviation may be larger in some cases (in particular, for CYS groups). In most cases you can be reasonably sure that the over-all trend (whether the pK of a given site is shifted up or down relative to the standard value in solution) is predicted correctly. As with all computational methods, computed differences are more accurate than the absolute values: for example, predicting an effect of a point mutation on the pK of a nearby site may be more accurate than calculating the absolute value of that pK. Likewise, quantitative predictions of the changes in a pK due to a well-defined local conformational change should be well within reach of the methodology. As with any pK-predicting method, lower resolution X-ray structures tend to provide worse accuracy; in particular, older structures with resolution lower than 2.5 A may be especially dangerous in that respect. If available, it is always a good idea to run a few different structures corresponding to the same biological molecule and compare the results: a consensus (e.g. geometric mean) pK value is a better approximation to reality. One should also try and go through the log messages generated by H++, in particular "leap.log". If heavy atoms were missing in the original X-ray structure and were added by H++, one should be particularly careful in trusting the computed pKs of the residues in the immediate vicinity of the added atoms. We strongly recommend consulting the relevant literature before using the server in your work. Some references are available in the Methodology description on the home page. Here is a comparison of H++ generated pKs with the corresponding experimental values for a set of high quality protein structures.

Q: Ok, I am not satisfied with the accuracy of my calculations. Can I do something within the H++ environment to improve it?
A: Yes, but this will require additional work. If your focus is only a few groups, you can set the internal dielectric in accordance with where the groups are in the molecule (see above and the following link to see how the internal dielectric affects the results). Do a full calculation first, then look up the "your_molecule.summ" file in the "Listing" on the results page. This will show you the various contributions to the pK of your focus group. In particular, the "delta self" term is the desolvation penalty, indicative of how buried the group is. If its absolute value is relatively large (say, > 2 pk units), it means that a lower internal dielectric value (4) may be more appropriate for this group. On the other hand, a small "delta self" is telling you that the group is close to the surface, in which case you may benefit from using a higher dielectric of 20. You can repeat this procedure separately for each group of interest: limiting the over-all number of titratable groups (see below) will speed up the calculations. See also Demchuk, E. and R. C. Wade, ``Improving the continuum dielectric approach to calculating pKa's of ionizable groups in proteins", J. Phys. Chem., 100, 17373, (1996). It is also a good idea to use the best (highest resolution) X-ray structure for pK calculations. If several good structures are available, select 2-3 most accurate ones and average the result.

Q: Will H++ protonate or deprotonate the C-terminus carboxyl or N-terminus amine groups?
A: No. H++ does calculate the pK_(1/2) values for these groups. However, the protonation state of these groups are not changed in the output pdb file, even if they should be based on the input pH value. This is because, currently, standard Amber force field parameters are only available for these groups in their default protonation state.

Q: How can I cap protein termini?
A: Protein termini can be capped using the following procedure:
1. Process your uncapped structure through H++.
2. Download the output PDB file generated by H++.
3. To cap the C-terminus: (a) change the C-terminus OXT atom to N, (b) change its residue name to NME (N-methylamid) or NHE (amide), and (c) increase its residue number by 1.
4. To cap the N-terminus: (a) delete the N-terminus H1 and H3 atoms (b) change the N-terminus H2 atom to C, (c) change its residue name to ACE, (d) decrease its residue number by 1, and (e) move the N-terminus C atom to the front of the chain.

Q: Can H++ process phosphorylated side chanse?
A: Yes, the following phosphorylated amino acids can be processed by H++:
PTR: phosphotyrosine with a 2- net charge (using PDB nomenclature)
SEP: phosphoserine with a 2- net charge (using PDB nomenclature)
TPO: phosphothreonine with a 2- net charge (using PDB nomenclature)
Y1P: phosphotyrosine with a 1- net charge (nonstandard nomenclature)
S1P: phosphoserine with a 1- net charge (nonstandard nomenclature)
T1P: phosphothreonine with a 1- net charge (nonstandard nomenclature)
See N. Homeyer, A.H.C. Horn, H. Lanig, and H. Sticht. "AMBER force-field parameters for phosphorylated amino acids in different protonation states: phosphoserine, phosphothreonine, phosphotyrosine, and phosphohistidine. J. Mol. Model. 12:281-289, 2006. for force field parameters.

Q: What is this mysterious "protonation state diagram"?
A: It represents the lowest protonation microstates in your system, along with their relative energies. This is a more fundamental description of titratable system than that based on pKs, and may be useful in many cases. For the definition and a usage example, see this publication . The cartoon on the H++ results page only show a few lowest states for select residues (difference between the lowest and the next protonation microstate < 3 kcal/mol), a link to the full list of the lowest 128 microstates is available on the results page. The line at the top of the list shows titratable residues in the order corresponding to the occupancies given below: "1" - protonated, "0" - deprotonated. One caveat is that the titratable group index in this diagram always starts from one, that is it may be shifted relative to the input PDB file.

Q: My structure has multiple chains. Which one will be used for the calculation?
A: Generally, all of them. Be careful, though. While most chains are "legitimate" chains corresponding to subunits of the whole multi-mer (as in e.g. hemoglobin), some PDB files use chain identifier "A", "B", etc. to denote different models of the same monomer (as in e.g. 2TRX). Make sure you supply only the one you want. For NMR structures, you get a window where you can choose which model to use.

Q: My structure has missing residues in the middle of the sequence (discontiguous sequence). Will H++ still compute pKs and protonate the structure?
A: No. H++ will read in your structure and report an error. pK estimates of any kind depend critically on fine details of your structure: if chunks of the structure are missing, the computed pKs may sometimes be completely off, especially in the vicinity of the missing part. So you have to be very careful with what you do next. The safest approach is to find another PDB structure which does not have missing residues. If one is not available, there are still a couple of things you can try. If the groups you are interested in are not located in the vicinity of the structural gaps (in real 3D space, not in sequence space!), you may re-do the computation by treating discontiguous parts of the structure as separate chains. Simply insert a "TER" in each gap in the PDB and resubmit. Make sure that residues in each new ``chain" are numbered sequentially, without gaps. Generally, the more residues are missing, and the closer is the group of interest to the missing space, the less reliable the results are. A somewhat safer, but much more laborious approach is reconstructing the missing parts, e.g. by homology modelling. Still, if the group of interest is in the immediate vicinity (within a few Angstroms) of the gap, the only reliable way to compute its pK is to use an experimental structure without gaps. Note, however, that if only a few heavy atoms in a residue are missing, H++ will add them automatically and proceed. You have to be careful though -- if these atoms lie close to the region of interest, the accuracy may be affected. Always check the corresponding log files. Hydrogens are always added by H++ if they are not available (most likely scenario) from the original structure, but that's OK. We suggest that you examine your structure carefully before submitting. While H++ is designed to catch many problems and report them to you, you can not rely on it to ``proofread" your input.

Q: How can I keep specific buried water molecules?
A: Water molecules in the input PDB file can be identified by the residue name "HOH". Edit the input PDB file and change "HETATM" to "ATOM" for the water molecules you want to keep.
NOTE: If the PDB file only contains the O atom, and not the H atoms, H++ will add the missing H atoms followed by a crude optimization step.

Q: How can I keep an explicit ion in my computation?
A: Generally, all ions are automatically stripped-off in the beginning of the H++ process. However, you can explicitly tell H++ to keep specific ion(s) by using "ATOM" record for your ion(s), as opposed to "HETATM" in your input PDB. Also, you will have to use specific atom and residue names for the ions, which can be found here .

Generally ions that represent ligands can be kept, however we do not recommend keeping ions that are part of the solvent, since these ions are implicitly included in the implicit solvent model used by H++.

Q: How can I calculate the pKs for membrane embedded proteins?
A: H++ uses the lipid17 AMBER force field parameters for lipids (Skjevik et. al. 2012). To calculate the pK of membrane embedded proteins, first construct a pdb file containing the protein embedded in the membrane, see e.g. http://ambermd.org/tutorials/advanced/tutorial16/. Ensure that the atom and residue names in the pdb file conform to the convention used for the lipid17 force field parameters (Skjevik et. al. 2012). Then, process the pdb file through H++, with the "Correct orientation of ASN, GLN and HIS groups, add H atoms, and assign HIS H atoms to the or O, based on van der Waals contacts and H-bonding" option deselected.

Q: My structure contains a ligand, but it is getting stripped off. Is there any way to keep it for the calculation?
A: Yes, there are three different ways in which ligands can be included:

1. If the ligand is a protein, peptide, DNA or RNA, the current version of H++ should in most cases handle it safely automatically -- you can submit the structure in regular PDB format (but make sure that the ligand records are "ATOM" and not "HETATM"). The same procedure will work if you decide to keep a water molecule in, just make sure its residue name is "HOH" (May cause trouble in multi-chain proteins, the safest option is to strip all water molecules for multi-chain-proteins ). Note, however, that solvation effects are already accounted for (implicitly) by the H++ methodology, and so keeping explicit water molecules in the structure is generally not recommended for pK calculations. Of course, there are exceptions to this rule: please consult relevant literature for details.

2. For many other types of ligands, H++ can still handle them automatically, but you need to be very careful. First, you will need to edit the PDB file and change "HETATM" to "LIGAND" ( columns 1-6 ) for each ligand atom. At the moment only one ligand be can processed per run. Click here for further details and to see an example. For some complex ligand structures, such as those containing the heme group, the above method may not work, in which case, consider the alternative described below.

3. Alternatively, you can "manually" add charges and atomic radii records to your PDB, and then input your file in the PQR (PDB + charge + radius) format. This way, it will bypass the "clean-up" routines and will go straight into the pK calculating part. CAVEAT: the titratable groups in the input PQR file must be in their standard protonation states (doubly protonated for "HIS", which must be renamed into "HIP") with correct number of protons present; the atom and residue names must follow the AMBER convention. Also, to be on the safe side make sure that all records are only "ATOM" or/and "HETATM". One way to get what you need is to first upload your PDB file without the ligand, run H++ on it, and retrieve the "your_molecule.replaced.pqr" PQR file from the results page. This one has all the names and protonation states set right for the task. Then, all you have to do is add the missing ligand records to the file (and make sure its extension is ".pqr") and process it through H++ again. Note that H++ does not produce a PDB (PQR) file in the predicted protonation state when PQR files are used as input.

Note that if you want your ligand treated as a titratable group but it is not one of the standard amino-acids, life becomes more complicated, that is the above simple solution won't work. To see how these types of problems are handled, see e.g. this publication . Contact us if you are working on a real research project and this problem becomes an insurmountable stumbling block.

Q: Ok, but if I am to prepare the PQR file manually, where do I get the charges and radii for the ligand, required by the PQR format?
A: This depends on what kind of ligand that is. For ions, or something very simple like water, the best way is to take these from the AMBER data base. We are also building a data-base of PQR files for more complex ligands, see here . For other ligands, you will most likely need to compute charges from scratch, unless you have access to pre-computed ones for your specific ligand. H++ users have reported using PRODRG2 server for the purpose. Ideally, you would want to perform a high quality QM calculation followed by a charge fitting procedure such as RESP. As for the radii, these follow a very simple pattern, see any PQR file generated by H++ (currently, Bondi set is used). When constructing a PQR file, always remember to separate different chain records by a "TER", and put a "TER" in the end as well to let H++ know which residues should be treated as terminal.

Q: Can I limit the set of titratable groups to be included in the calculation?
A: Yes. But doing it blindly may be risky, as some important interactions may get left out. The advantage is, of course, speed. And, if you focus on a particular small group of residues you can take dvantage of choosing a more appropriate values of the input parameters, such as the internal dielectric. Proceed with caution. In any case, we recommend that if your structure is small enough you first do a "full" run to identify which sites do not interact strongly with the sites you want to focus on -- H++ outputs a specific file that lists residues that contribute most to each pK shift. As a result of such a run you will have a "your_molecule.replaced.pqr" file (in "View all files generated for this run: Listing") which will become your next-step input structure in PQR format. It has all ionizable groups in ther standard protonation states. Now suppose you DO NOT want residue "GLU 35" to be treated by H++ as titratable. Change its name into "GLX 35", and it will be considered by H++ as non-ionizable when you upload the modified structure in the PQR format. By default, the server treats only GLU, ASP, ARG, CYS, LYS, HIP (which is the AMBER name for doubly the protonated HIS), and TYR as titratable, and any other name as non-titratable in the PQR format. Do not forget that the input file must have .pqr extension to be processed in this manner by H++. Limiting the number of groups treated as titratable may be critical for large structures. You can cautiously assume that groups further than 15 Angstroms away from your group of interest do not matter much, and can be ``made" non-titratable using the above trick. Note that H++ does not produce a PDB (PQR) file in the predicted protonation state when PQR files are used as input.

Q: What parameters (force-field, etc) are used to set up AMBER topology and coordinate files?
A: The AMBER ff19SB force field for proteins (Tian et. al. 2020), ff99bc0+bsc1 for DNA (Ivani et. al. 2016), ff99bsc0_caseP_Shaw for RNA (Tan et. al. 2018), lipid17 for lipids(Skjevik et. al. 2016) for lipids, and the mbondi2 radii set; Two options are available for generating Amber format topology and coordinate files with an explicit water box: the classic TIP3P (Jorgensen et. al. 1983) 3-point water model, or the more accurate OPC (Izadi et. al. 2014) 4-point water model. Åqvist ion parameters are used for common monovalent ions (Li+, Na+, K+, Rb+, Cs+, Cl-, F-, I-, Br-) as optimized for AMBER by Joung and Cheatham 2008, and the ion parameters by Li et. al. 2013 for other monovalent and multivalent ions. All disulfate (CYX-CYX) bonds found in the structure are set.

Q: Can I calculate the matrix of electrostatic site-site interactions?
A: Yes. These are calculated and are in "your_molecule.g". This file, along with many other useful auxiliary files, are available from the last page, in "View all files generated for this run: Listing". Note, however, that the groups in some of these files maybe numbered sequentially starting from residue 1 -- that is there maybe a constant offset of indices relative to your input structure.

Q: Can I obtain the breakdown of the electrostatic contributions to pK into the "Born" (desolvation penalty) and "Background" terms?
A: Yes. These are in the "your_molecule.summ" file. See the above. A very detailed decomposition of the energetic contributions to pK from every residue is available from "your_molecule.pK_decomposition"

Q: What is the "flip" option on the parameter selection screen?
A: The N and O atoms in the amide groups of ASN and GLN, and the N and C atoms in the imidazole ring of HIS, can not be easily distinguished from electron density maps. Thus the orientations of these atoms are frequently incorrect in PDB structures. The reduce software from the Richardson lab at Duke University is used to identify the preferred orientation for these atoms based on van der Waals contacts and H-bonding (Word et. al. 1999). reduce also adds missing H atoms and standardizes the bond length and bond angles of existing H atoms in the input PDB file. If the flip option is selected on the parameter selection screen, the orientation of the amide N and O atoms in ASN and GLN, and the imidazole N and C atoms in HIS in the PDB file may be flipped as determined by reduce. If the flip option is selcted, H++ also uses the added and standardized H atom placement for subsequent computations.

Q: Which tautomer does H++ use when adding H atoms to HIS?
A: If the flip option (described above) is selected on the parameter selection screen, then the HIS tautomer, delta or epsilon, is determined by reduce based on van der Waals contacts and H-bonding. In the case where reduce determines that the HIS is doubly protonated (HIP), H++ assumes that the singly protonated state is the epsilon tautomer (HIE) for the purpose of pK calculations.

If the flip option is not selected, then H++ assumes the epsilon tautomer (HIE) unless specifically identified as the delta tautomer (HID) in the incoming PDB file.

Q: Which tautomer does H++ use when adding hydrogens to GLU and ASP?
A: For GLU and ASP the hydrogen is added to the OE2 corbonyl oxygen atom, selected arbitrarily.

Q: How can I visualize the effect of protonation state changes?
A: Click the image below for a free, open-source utility (GEM) that allows coloring the surface of the H++ generated structures with the electrostatic potential. Compare structures generated at different pH values.

References:

Åqvist, J., (1990). Ion-water interaction potentials derived from free energy perturbation simulations. J. Phys. Chem., Vol. 94, No. 21, pp. 8021-8024.

Banas, P., et. al. (2010). Performance of molecular mechanics force fields for RNA simulations: Stability of UCG and GNRA hairpins. J. Chem. Theory Comput. vol. 6, pp. 3836-3849.

Hornak, V., Abel, R., Okur, A., Strockbine, B., Roitberg, A., and Simmerling, C. (2006). Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins: Structure, Function, and Bioinformatics, Vol. 65, No. 3, pp. 712-725.

Izadi, S., Anandakrishnan, R., and Onufriev, A.V. (2014). Building Water Models: A Different Approach. J. Phys. Chem. Lett., Vol. 5, No. 21, pp. 3863-3871.

I. Ivani; P. D. Dans; A. Noy; A. Pérez; I. Faustino; A. Hopsital; J. Walther; P. Andrió; R. Goni; A. Balaceanu; G. Portella; F. Battistini; J. L. GelpÃ; C. González; M. Vendruscolo; C. A. Laughton; S. Harris; D. A. Case; M. Orozco. Parmbsc1: A refined force field for DNA simulations. Nature Meth., 2016, 13, 55–58

Jorgensen et. al. (1983). Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. vol. 79, pp. 926-935.

Joung, S., and Cheatham, T.E., (2008). Determination of alkali and halide monovalent ion parameters for use in explicitly solvated biomolecular simulations. J. Phys. Chem., Vol. 112, pp. 9020-9041.

Krepl, M., et. al. (2012). Reference simulations of noncanonical nucleic acids with different variants of the AMBER force field: Quadruplex DNA,Quadruplex RNA, and Z-DNA. J. Chem. Theory Comput. vol. 8, pp. 2506-2520.

Li, B, Roberts, B.P., Chakravorty, D.K., and Merz, K.M. (2013). Rational design of particle mesh ewald compatible Lennard-Jones parameters for +2 metal cations in explicit solvent. J. Chem. Theory Comput. vol. 9, pp. 2733-2748.

Perez, A., Marchan, I., Svozil, D., Sponer, J., Cheatham, T.E., Laughton, C.A., and Orozco, M. (2007). Refinement of the AMBER Force Field for Nucleic Acids: Improving the Description of alpha/gamma Conformers. Biophys. J., Vol. 92, No. 11, pp. 3817-3829.

] Å. Skjevik; B. D. Madej; C. J. Dickson; C. Lin; K. Teigen; R. C. Walker; I. R. Gould. Simulations of lipid bilayer self-assembly using all-atom lipid force fields. Phys. Chem. Chem. Phys., 2016, 18, 10573–10584.

D. Tan; S. Piana; R. Dirks; D. Shaw. RNA force field with accuracy comparable to state-of- the-art protein force fields. Proc. Natl. Acad. Sci. USA, 2018, 115, E1346–E1355.

C. Tian; K. Kasavajhala; K. Belfon; L. Raguette; H. Huang; A. Migues; J. Bickel; Y. Wang; J. Pincay; Q. Wu; C. Simmerling. ff19SB: Amino-Acid-Specific Protein Backbone Parameters Trained against Quantum Mechanics Energy Surfaces in Solution. J. Chem. Theory Comput., 2020, 16, 528–552

Wang, J., and Kollman P.A. (2001). Automatic parameterization of force field by systematic search and genetic algorithms. Journal of Computational Chemistry, Vol. 22, No. 12, pp. 1219-1228.

Word, M.J., Lovell, S.C., Richardson, J.S., and Richardson, D.C. (1999). Asparagine and Glutamine: Using Hydrogen Atom Contacts in the Choice of Side-Chain Amide Orientation. Journal of Molecular Biology, Vol. 285, pp. 1735-1747.

Zgarbova, M., et. al. (2013). Toward improved description of dna backbone: Revisiting epsilon and zeta torsion force field parameters J. Chem. Theory Comput. vol. 9, pp. 2339-2354.