[[PageOutline]]

= Package BysPrior =

BysPriorInf stands for Bayesian Prior Information and allows to define prior 
information handlers to be used in estimation systems (max-likelihood and 
bayesian ones).

A prior is a distribution function over a subset of the total set of variables 
of a model that expresses the knowledge about the phenomena behind the model.

The effect of a prior is to add the logarithm of its likelihood to the 
logarithm of the likelihood of the global model. So it can be two or 
more priors over some variables. For example, in order to stablish a 
truncated normal we can define a uniform over the feasible region and 
an unconstrainined normal.

In order to be estimated with [wiki:OfficialTolArchiveNetworkNonLinGloOpt NonLinGloOpt] 
(max-likelihood) and [wiki:OfficialTolArchiveNetworkBysSampler  BysSampler ]
(Bayesian sampler), each prior must define methods to calculate the logarithm
of the likelihood (except an additive constant), its gradient and its hessian,
and an optional set of constraining inequations, in order to define the feasible 
region. Each inequation can be linear or not and the gradient must 
be also calculated. Note that this implies that priors should be continuous and 
two times differentiable and restrictions must be continuous and differerentiable,
but this an admisible restricion in almost all cases.

== Non informative priors ==

Let [[LatexEquation( \beta )]] a uniform random variable in a region 
[[LatexEquation(\Omega\in\mathbb{R}^{n} )]] which likelihood function is [[BR]]

[[LatexEquation(lk\left(\beta\right) \propto 1 )]]

Since the logarithm of the likelihood but a constant is zero, when 
log-likelihood is not defined for a prior, the default assumed will be the 
uniform distribution, also called non informative prior.

=== Domain prior ===
The easiest way, but one of the most important, to define a non informative 
prior, is to stablish a domain interval for one or more variables. 

In this cases, you mustn't to define the log-logarithm nor the constraining 
inequation functions, but simply it's needed to fix the lower and upper 
bounds:[[BR]][[BR]]

[[LatexEquation( \beta\in\Omega\Longleftrightarrow l_{k}\leq\beta_{i_k}\leq u_{k}\wedge-\infty\leq l_{k}<u_{k}\leq\infty\forall k=1\ldots r )]]

If both lower and upper bounds are non finite, then we call it the neutral 
prior, that is equivalent to don't define any prior.

=== Polytope prior ===
A polytope prior is defined by a system of compatible linear inequalities [[BR]]

[[LatexEquation( A\beta\leq a\wedge A\in\mathbb{R}^{r\times n}\wedge a\in\mathbb{R}^{r} )]]

An special and common case of polytope region is the defined by order relations like

[[LatexEquation( \beta_{i}}\leq\beta_{j}})]]

We can implement this type of prior by means of a set of [[LatexEquation( r )]] 
inequations but, since [wiki:OfficialTolArchiveNetworkNonLinGloOpt NonLinGloOpt] 
doesn't have any special behaviour for linear inequations, it could be an 
inefficient implementation. However we can define just one non linear inequation 
that is equivalent to the full set of linear inequations. If we define 

[[LatexEquation( d\left(\beta\right)=A\beta-a=\left(d_{k}\left(\beta\right)\right)_{k=1\ldots r} )]]

then

[[LatexEquation( D_{k}\left(\beta\right)=\begin{cases} 0 & \forall d_{k}\left(\beta\right)\leq0\\ d_{k}\left(\beta\right) & \forall d_{k}\left(\beta\right)>0\end{cases} )]]

is a continuous function in [[LatexEquation( \mathbb{R}^{n}  )]] and

[[LatexEquation( D_{k}^{3}\left(\beta\right)=\begin{cases} 0 & \forall d_{k}\left(\beta\right)\leq0\\ d_{k}^{3}\left(\beta\right) & \forall d_{k}\left(\beta\right)>0\end{cases}  )]]

is continuous and differentiable in [[LatexEquation( \mathbb{R}^{n} )]]

[[LatexEquation( \frac{\partial D_{k}^{3}\left(\beta\right)}{\partial\beta_{i}}=\begin{cases} 0 & \forall d_{k}\left(\beta\right)\leq0\\ 3d_{k}^{2}\left(\beta\right)A_{ki} & \forall d_{k}\left(\beta\right)>0\end{cases}  )]]

The feasibility condition can be defined as a single nonlinear 
inequality continuous and differentiable everywhere

[[LatexEquation( g\left(\beta\right)=\underset{k=1}{\overset{r}{\sum}}D_{k}^{3}\left(\beta\right)\leq0 )]]

The gradient of this function is

[[LatexEquation( \frac{\partial g\left(\beta\right)}{\partial\beta_{i}}=3\underset{k=1}{\overset{r}{\sum}}D_{k}^{2}\left(\beta\right)A_{ki} )]]

== Multinormal prior ==

When we know that a single variable should fall symmetrically close to a known value 
we can express telling that it have a normal distribution with average in these value.
This type of prior knowledge can be extended to higher dimensions by the multinormal
distribution

 [[LatexEquation(  \beta\sim N\left(\mu,\Sigma\right) )]] 

which likelihood function is

 [[LatexEquation(  lk\left(\beta\right)=\frac{1}{\left(2\pi\right)^{n}\left|\Sigma\right|^{\frac{1}{2}}}e^{^{-\frac{1}{2}\left(\beta-\mu\right)^{T}\Sigma^{-1}\left(\beta-\mu\right)}} )]]

The log-likelihood is
 
 [[LatexEquation( L\left(\beta\right)=\ln\left(lk\left(\beta\right)\right)=-\frac{n}{2}\ln\left(2\pi\right)-\frac{1}{2}\ln\left(\left|\Sigma\right|\right)-\frac{1}{2}\left(\beta-\mu\right)^{T}\Sigma^{-1}\left(\beta-\mu\right) )]]
 
The gradient is

 [[LatexEquation( \left(\frac{\partial L\left(\beta\right)}{\partial\beta_{i}}\right)_{i=1\ldots n}=-\Sigma^{-1}\left(\beta-\mu\right) )]]

and the hessian 

 [[LatexEquation( \left(\frac{\partial^{2}L\left(\beta\right)}{\partial\beta_{i}\partial\beta_{j}}\right)_{i,j=1\ldots n}=-\Sigma^{-1} )]]
 
 
== Inverse chi-square prior ==


== Transformed prior ==

Sometimes we have an information prior that has a simple distribution over a 
transformation of original variables. For example, if we know that a set of 
variables has a normal distribution with average equal to another variable, 
as in the case of latent variables in hierarquical models

[[LatexEquation( \beta_{i}\sim N\left(\beta_{1},\sigma\right)\forall i=2\ldots n )]]

Then we can define a variable transformation like this

[[LatexEquation( \gamma \left(\beta\right)=\left(\begin{array}{c} \beta_{2}-\beta_{1}\\ \vdots\\ \beta_{n}-\beta_{1}\end{array}\right)\in\mathbb{R}^{n-1} )]]

and define the simple normal prior

[[LatexEquation( \gamma\sim N\left(0,\sigma^{2}I\right) )]]

Then the log-likelihood of original prior will be calculated from the 
transformed one as

[[LatexEquation( L\left(\beta\right)=L^{*}\left(\gamma\left(\beta\right)\right) )]]

If we know the first and second derivatives of the transformation

[[LatexEquation( \frac{\partial\gamma_{k}}{\partial\beta_{i}}  )]]

[[LatexEquation( \frac{\partial^{2}\gamma_{k}}{\partial\beta_{i}\partial\beta_{j}}  )]]

the we can calculate the original gradient and the hessian after the gradient 
and the hessian of the transformed prior as following

[[LatexEquation( \frac{\partial L\left(\beta\right)}{\partial\beta_{i}}=\underset{k=1}{\overset{K}{\sum}}\frac{\partial L^{*}\left(\gamma\right)}{\partial\gamma_{k}}\frac{\partial\gamma_{k}}{\partial\beta_{i}}  )]]

[[LatexEquation( \frac{\partial L^{2}\left(\beta\right)}{\partial\beta_{i}\partial\beta_{j}}=\underset{k=1}{\overset{K}{\sum}}\left(\frac{\partial^{2}L^{*}\left(\gamma\right)}{\partial\gamma_{k}\partial\beta_{j}}\frac{\partial\gamma_{k}}{\partial\beta_{i}}+\frac{\partial L^{*}\left(\gamma\right)}{\partial\gamma_{k}}\frac{\partial^{2}\gamma_{k}}{\partial\beta_{i}\partial\beta_{j}}\right)=\underset{k=1}{\overset{K}{\sum}}\left(\frac{\partial^{2}L^{*}\left(\gamma\right)}{\partial\gamma_{k}\partial\gamma_{k}}\frac{\partial\gamma_{k}}{\partial\beta_{i}}\frac{\partial\gamma_{k}}{\partial\beta_{j}}+\frac{\partial L^{*}\left(\gamma\right)}{\partial\gamma_{k}}\frac{\partial^{2}\gamma_{k}}{\partial\beta_{i}\partial\beta_{j}}\right)  )]]