Copyright ©2008 T. H. Merrett
COMP 617 Information Systems Winter 2008 Week 2
Time Series, etc.
1. An example from the AQuery language [LernShash:AQuery].
For a given stock and a given date, find the best profit one could
obtain by buying the stock and then selling it later that day.
let runningMin be par min of price order ts by ID, tradeDate;
let runningProfit be price - runningMin;
let maxProfit be equiv max of runningProfit by ID, tradeDate;
let buyTS be equiv min of ts by runningMin, ID, tradeDate;
Trades(ID tradeDate price ts ) runningMin runningProfit maxProfit buyTS
ACME 030511 1.25 030511.0 1.25 0.00 0.09 030511.0
ACME 030511 1.33 030511.1 1.25 0.08 0.09 030511.0
ACME 030511 1.22 030511.2 1.22 0.00 0.09 030511.2
ACME 030511 1.15 030511.3 1.15 0.00 0.09 030511.3
ACME 030511 1.17 030511.4 1.15 0.02 0.09 030511.3
ACME 030511 1.17 030511.5 1.15 0.02 0.09 030511.3
ACME 030511 1.18 030511.6 1.15 0.03 0.09 030511.3
ACME 030511 1.21 030511.7 1.15 0.06 0.09 030511.3
ACME 030511 1.24 030511.8 1.15 0.09 0.09 030511.3
ACME 030511 1.23 030511.9 1.15 0.08 0.09 030511.3
let priceSell be price;
let tsSell be ts;
Sell <- [ID,tradeDate,priceSell,tsSell,runningMin]
where runningProfit=maxProfit in Trade;
Sell(ID tradeDate priceSell tsSell runningMin)
ACME 030511 1.24 030511.8 1.15
let priceBuy be price;
let tsBuy be ts;
Buy <- [ID,tradeDate,priceBuy,tsBuy] in (Sell ijoin
[ID,tradeDate,priceBuy,tsBuy,runningMin] where ts=buyTS in Trades);
Buy(ID tradeDate priceBuy tsBuy )
ACME 030511 1.15 030511.3
BuySell <- Buy ujoin [ID,tradeDate,priceSell,tsSell] in Sell;
BuySell(ID tradeDate priceBuy tsBuy priceSell tsSell )
ACME 030511 1.15 030511.3 1.24 030511.8
2. A second example from the AQuery language [LernShash:AQuery].
Find the count of packets and their average length within each flow.
(A "flow" from a source s to a destination d ends if there is a
2-minute gap between consecutive packets from s to d.)
let tsPred be par pred of ts order ts by src, dest;
let flowStart be
if ts-tsPred>=0.0002 then 1 else
if ts>
Only this must be defined in a context which allows accum to be a state
variable and intializes it. The context must also supply the parameter
n.
comp windows(gp,qty,ts,n,data,runsum) is
{ comp parop runsum(value,accum) is
{ accum <- [red + of if seq<=n then qty else 0]
where Gp=gp in [Gp,qty,ts] in data
<< initialize accum for runsum >>
} alt
{ accum <- accum + par succ(n-1) of value order ts by gp
- par pred of value order ts by gp };
let seq be par + of 1 order ts by gp;
let Gp be gp;
}
We will assume that the partial functional mapping invocation of runsum
matches the first alt-block (accum is output) at the beginning of each
grouping, and matches the second (accum is input) everywhere else.
(Funops are similar but easier: there is no gp attribute, and par..by gp
becomes fun.. and where Gp=gp (and Gp) is not needed.)
NB maybe we should just call them all funops.
Here is the invocation of all this:
windows(in ID, in price, in ts, in 5, in Trades, out runsum);
let 5sums be par runsum of price order ts by ID;
let 5avgs be 5sums/5;
4. The second example from [WhitShash:stockSeries] gets us into serious time
series concepts, so try Google or the QA280.-- section of the library stacks.
Here's one to try: the convolution [en.wikipedia.org/wiki/Convolution,
mathworld.wolfram.com/Convolution.html, cnx.org/content/m11541/latest/,..]
Convolution of two time series x1,..,xn CONV y1,..,yn =
0 x0y0
1 y0y1 + x1y0
2 x0y2 + x1y1 + x2y0
:
j Sum_{k=0}^{n-1} x(k)*y(j-k)
:
Let's try it for two rectangular series x = 11111, y = 11111
(where the first terms are x0, y0, resp. and we assume 0s before, after)
0 1 = x0y0
1 2 = y0y1 + x1y0
2 3 = x0y2 + x1y1 + x2y0
3 4 = x0y3 + x1y2 + x2y1 + x3y0
4 5 = x0y4 + x1y3 + x2y2 + x3y1 + x4y0
5 4 = x1y4 + x2y3 + x3y2 + x4y1
6 3 = x2y4 + x3y3 + x4y2
7 2 = x3y4 + x4y3
8 1 = x4y4
In an alternative definition, one of the series is periodic (not 0s
before and after).
Let's try rectangular x = 11 and periodic y = ..11001100..,
i.e., x0 = 1, x1 = 1; .., y0 = 1, y1 = 1, y2 = 0, y3 = 0, ..
Showing only nonzero entries:
:
-2 1 = x1y_{-3}
-1 0 =
0 1 = x0y0
1 2 = y0y1 + x1y0
2 1 = x1y1
3 0 =
4 1 = x0y4
:
To motivate finding relational implementations of convolution, here are
some uses (from wikipedia):
# In statistics, a weighted moving average is a convolution.
# In probability theory, the probability distribution of the sum of two
independent random variables is the convolution of their individual
distributions.
# In optics, many kinds of "blur" are described by convolutions. A
shadow (e.g. the shadow on the table when you hold your hand between
the table and a light source) is the convolution of the shape of the
light source that is casting the shadow and the object whose shadow is
being cast. An out-of-focus photograph is the convolution of the
sharp image with the shape of the iris diaphragm. The photographic
term for this is bokeh.
# Similarly, in digital image processing, convolutional filtering plays
an important role in many important algorithms in edge detection and
related processes.
# In linear acoustics, an echo is the convolution of the original sound
with a function representing the various objects that are reflecting
it.
The convolution theorem says a convolution of two functions is the
inverse Fourier transform of the product of the Fourier transforms
of the two functions, so having FFT will also give us convolutions.
5. On the subject of Fourier transforms, [ColeShashZhao:uncoopTS] point out that
Fourier transforms, wavelets and the singular value decomposition do _not_ do
very well at reducing the dimensionality of certain "uncooperative" time
series (time series in which the power of the time series is spread over
all Fourier coefficients). They improve on "sketches" to reduce the
dimensionality of time series while approximately preserving the distance
between any two (for similarity queries).
Sketches are random projections: "taking [the] inner product of each time
series window, considered as a vector, with a set of random vectors". If
there are d of these random vectors, then the d results can be thought of
as a vector in a d-dimensional space. Repeating this 2b more times and
using medians is the approximation.
[ColeShashZhao:uncoopTS] use a variety of other techniques, including
convolution, to speed up the process of making sketches.
[LernShash:AQuery] Alberto Lerner, Dennis Shasha "AQuery, Query Language for
Ordered Data Optimization Techniques, and Experiments" VLDB 29 2003
[WhitShash:stockSeries] Arthur Whitney, Dennis Shasha "Lots o' Ticks: real-time
high performance time series queries on billions of trades and quotes" SIGMOD
2001
[ColeShashZhao:uncoopTS] Richard Cole, Dennis Shasha, Xiaojian Zhao "Fast Window
Correlations Over Uncooperative Time Series" KDD 2005
[See cs.nyu.edu/shasha/papers/papers.html for these papers in pdf.]