# Smart Antenna Receiver Based on a Single Chip Solution for GSM/DCS Baseband Processing

U. Girola, A. Picciriello, D. Vincenzoni

SIEMENS ICN S.p.A.<sup>1</sup>, I-20019 Settimo Milanese-Milano (Italy) e-mail: uberto.girola@siemens-icn.it, agostino.picciriello@siemens-icn.it, david.vincenzoni@siemens-icn.it Contact person: david.vincenzoni@siemens-icn.it

#### Abstract

This paper presents a single chip implementation of a space-time algorithm for co-channel interference (CCI) and intersymbol interference (ISI) reduction in GSM/DCS systems. The temporal channel for the Viterbi receiver and the beamformer weights for the CCI rejection are estimated jointly by optimizing a suitable cost function for separable space-time channels.

By taking into account nowadays integration capabilities provided by FPGA (Field Programmable Gate Array), it is demonstrated the feasibility of a single chip JSTE solution based on three processor architecture for carrier beamforming, equalization and demodulation.

### **1. Introduction**

The fast increasing number of subscribers in mobile communication networks leads towards methods for capacity enhancement of existing systems. GSM/DCS networks provide a number of already available features (frequency hopping - FH, power control - PC, discontinuous transmission - DTX) in order to reduce the co-channel interference (CCI), and therefore to increase capacity by means of a reduction of the frequency reuse factor. Smart antenna receivers represent an additional feature to further enhance the spectral efficiency [1] with the least impact on the actual hardware structures.

The evolution of the digital IC technology allows the developing of a powerful and flexible radio architecture that can easily cope with the main changes of the radio interface for GSM/DCS smart antennas system.

Systems based on DOA (Direction of Arrival) estimation has already been exploited [2], the computational power available nowadays makes feasible the use of techniques based on joint space-time estimation [3].

The algorithm presented herein is based upon an adaptive receiver which jointly optimizes the estimation of the beamformer weights and of the channel response for a MLSE receiver section. The reduction of the CCI is obtained with the optimum combining performed by the spatial filtering, the conventional MLSE receiver is used for the temporal processing. The main idea for the cost function is the maximization of the SINR (Signal to Interference plus Noise Ratio) [4][5].

The parallel architecture of the algorithm is particular attractive for a computational structure based on systolic arrays. The single chip solution presented here is formed by three processors architecture: Cholesky processor, FBS (Forward Backward Substitution) processor and the Viterbi processor.

The paper is organized as follows: in Section 2 it is briefly described the space-time channel model and the JSTE algorithm. In Section 3 it is proposed the architecture of the JSTE chip. Finally, Section 4 contains some performance of the Smart Antenna system based on the single chip solution.

#### 2. JSTE Receiver

The mobile radio channel is usually described in terms of its temporal features: path loss, fast and slow fading, Doppler spread. Using an antenna array system the spatial dimension must be taken into account. The signal received by an antenna array of M elements can be conveniently arranged into an M-dimensional vector formed by a wanted signal plus a contribution of the interference signals presents in a cellular system with low

<sup>&</sup>lt;sup>1</sup> Since November 1999 Italtel S.p.a. has become Siemens icn S.p.a.

reuse distance. The signal received by the antenna array will be:

$$\mathbf{y}(t) = \sum_{k=-\infty}^{+\infty} x_k \mathbf{h}(t-kT) + \sum_{q=1}^{Q} \sum_{k=-\infty}^{+\infty} i_{k,q} \mathbf{h}_q(t-kT) + \mathbf{n}(t)$$
(1)

where  $\{\mathbf{x}_k\}$  is the wanted signal,  $\{\mathbf{i}_{k,q}\}$  are the interferers' sequences and  $\mathbf{n}(t)$  is spatially and temporally uncorrelated Gaussian noise  $(\mathbf{E}[\mathbf{n}(t)\mathbf{n}^{\mathsf{H}}(t+\tau)] = \sigma^2 \mathbf{I}\delta(\tau))$ . By sampling the received signal at symbol rate T and collecting N time samples into a single matrix, the received signal  $\mathbf{Y} = [\mathbf{y}(nT), \mathbf{y}((n+1)T), \dots, \mathbf{y}((n+N-1)T)]$  can be equivalently rewritten as

$$\mathbf{Y} = \mathbf{H}\mathbf{X} + \sum_{q=1}^{U} \mathbf{H}_{q}\mathbf{I}_{q} + \mathbf{N}$$
(2)

**H**=[**h**(0), **h**(T), ..., **h**((L-1)T)] is an M×L matrix representing the channel samples (L is the memory length of the channel), **X**=[**x**(*n*), **x**(*n*+1), ..., **x**(*n*+N-1)] is the non-symmetric Toeplitz structured data matrix, where **x**(*n*) are L symbols of the transmitted sequence sorted in a vector **x**(*n*)=[ $x_n$ , ...,  $x_{n:L+1}$ ]<sup>T</sup>. A similar relationship holds true for **H**<sub>q</sub> and **I**<sub>q</sub>. It is important to point out that the structured noise formed by interference and white noise is temporally and spatially correlated, correlation being given by the CCI.

The JSTE approach approximates the channel **H** with a rank-one channel (low-rank) and it separates the space and time processing while performing the optimization of space and time parameters jointly. The receiver has to be designed to cope with strong CCI and to equalize severely distorted channels. The joint estimates of  $\mathbf{w} = [w_1, w_2, ..., w_M]^T$  and  $\mathbf{h} = [h_1, h_2, ..., h_L]^T$  for the JSTE receiver can be evaluated by the maximization of the SINR after the spatial filter

$$(\mathbf{w}_{opt}, \mathbf{h}_{opt}) = \underset{\mathbf{w}, \mathbf{h}}{\arg \max} \frac{\left\|\mathbf{w}^{H} \mathbf{Y}\right\|^{2}}{\left\|\mathbf{w}^{H} \mathbf{Y} - \mathbf{h}^{H} \mathbf{X}\right\|^{2}}$$
(3)

It can be show that this optimization can be carried out with respect to  $\mathbf{h}$ 

$$\mathbf{h}_{opt} = \arg \max_{\mathbf{h}} \frac{\mathbf{h}^{H} \hat{\mathbf{R}}_{xx} \mathbf{h}}{\mathbf{h}^{H} \mathbf{R}_{s}^{\perp} \mathbf{h}}$$
(4a)

$$\mathbf{w}_{opt} = \hat{\mathbf{R}}_{yy}^{-1} \hat{\mathbf{R}}_{yx} \mathbf{h}_{opt}$$
(4b)

where  $\mathbf{R}_{s}^{\perp} = \hat{\mathbf{R}}_{xx} - \hat{\mathbf{R}}_{xy}\hat{\mathbf{R}}_{yy}^{-1}\hat{\mathbf{R}}_{yx}$  is the Schur complement of  $\hat{\mathbf{R}}_{yy}$  in  $\mathbf{R}$ , and  $\hat{\mathbf{R}}_{xx} = \mathbf{X}\mathbf{X}^{H} / N$  denotes the sample covariance matrix ( $\hat{\mathbf{R}}_{yy}, \hat{\mathbf{R}}_{xy}$  are similarly defined) for a slot of N samples. If the training sequence is white (PN sequence), the sample covariance matrix is asymptotically  $\hat{\mathbf{R}}_{xx} \rightarrow \mathbf{I}$ , and the objective function (3) can be modified as:

$$(\mathbf{w}_{opt}, \mathbf{h}_{opt}) = \underset{\mathbf{w}, \mathbf{h}, \|\mathbf{h}\|^2 = 1}{\arg\min} \|\mathbf{w}^H \mathbf{Y} - \mathbf{h}^H \mathbf{X}\|^2$$
(5)

The optimum channel  $\mathbf{h}_{opt}$  follows to be the eigenvector  $\mathbf{q}_{min}$  associated to the minimum eigenvalue  $(\lambda_{min})$  of  $\mathbf{R}_{\circ}^{\perp}$ .

$$\mathbf{h}_{opt} = \mathbf{q}_{\min} \tag{6a}$$

$$\mathbf{w}_{opt} = \hat{\mathbf{R}}_{yy}^{-1} \hat{\mathbf{R}}_{yx} \mathbf{q}_{\min}$$
(6b)

#### **2.1 JSTE Computation steps**

The JSTE optimization needs the evaluation of the eigenvector corresponding to the smallest eigenvalue of the matrix  $\mathbf{R}^{\perp}(n)$  defined on the matrix

$$\mathbf{R}(n) = \begin{bmatrix} \hat{\mathbf{R}}_{yy}(n) & \hat{\mathbf{R}}_{yx}(n) \\ \hat{\mathbf{R}}_{xy}(n) & \hat{\mathbf{R}}_{xx}(n) \end{bmatrix}$$
(8)

The solution proposed here is based on a partial Cholesky factorization [6] as it allow the control of the numerical stability together with a good computational efficiency.

The covariance matrix can be expressed as  $\mathbf{R}(n) = \overline{\mathbf{L}}(n)\overline{\mathbf{L}}^{H}(n)$ , where

$$\overline{\mathbf{L}}(n) = \begin{bmatrix} \mathbf{L}(n) & \mathbf{0} \\ \mathbf{U}(n) & \mathbf{L}_s(n) \end{bmatrix}$$
(9)

is the Cholesky factor of  $\mathbf{R}(n)$ .  $\mathbf{L}_{s}(n)$  is the Cholesky factor of  $\mathbf{R}_{s}^{\perp}(n)$ . The eigenvector corresponding to the smallest eigenvalue of the Schur complement can be obtained from iterative solution of

$$\mathbf{L}_{s}(n)\mathbf{L}_{s}^{H}(n)\mathbf{h}_{n} = \mathbf{h}_{opt}(n-1)$$
(10)

and the beamforming weights w

$$\mathbf{w}_{opt} = \mathbf{L}^{-\mathrm{H}}(n)\mathbf{U}^{\mathrm{H}}(n)\mathbf{h}_{opt}$$
(11)

To summarize,  $\mathbf{h}_{opt}(n)$  and  $\mathbf{w}_{opt}(n)$  are determined by three steps: 1) Cholesky factorization of  $\mathbf{R}(n)$  via fast Schur reduction, 2) inverse power method to obtain  $\mathbf{h}_{opt}(n)$  through a two-step back-substitution approach, 3) a final back-substitution for  $\mathbf{w}_{opt}(n)$ .

The channel  $\mathbf{h}_{opt}(n)$  and beamformer's weights  $\mathbf{w}_{opt}(n)$  are estimated by using the known training sequence. It is worthwhile observing that not only the movement of mobile terminals implies a time varying channel, but also the strong variability of the CCI affects the non stationary nature of the environment. As a matter of fact, GSM/DCS systems are not inter-cell synchronized and therefore interference from neighboring cells have to be considered as asynchronous. As a result, the estimates of  $\mathbf{h}_{opt}(n)$  and  $\mathbf{w}_{opt}(n)$  have to be updated in a *Decision Directed* manner within the burst in order to make the spatial filter  $\mathbf{w}_{opt}(n)$ and the Viterbi Algorithm (VA) time varying.

The JSTE algorithm can be made time varying by updating, at time instant n, the structured matrix **R** 

$$\mathbf{R}(n) = \lambda \mathbf{R}(n-1) + \mathbf{g}_n \mathbf{g}_n^H \tag{12}$$

 $0 < \lambda < 1$  and  $\mathbf{g}_n = [\mathbf{y}^T(n), \hat{\mathbf{x}}^T(n)]^T$ , where  $\hat{\mathbf{x}}(n)$  are the L-symbols sequence from ML path in the trellis. The  $\mathbf{h}_{opt}(n)$  and  $\mathbf{w}_{opt}(n)$  are updated at each time sample.

#### **3. JSTE Chip**

The JSTE chip (figure 1) is composed of three processors implementing the Cholesky factorization of  $\mathbf{R}(n)$ , the forward-backward substitution in order to compute **w** and **h**, and the demodulation of the transmitted sequence by means the Viterbi processor.

The updating of Cholesky factor and the computing of  $\mathbf{h}_{opt}(n)$  and  $\mathbf{w}_{opt}(n)$  are made symbol by symbol. The elaboration start from midamble using the training sequence, out of midamble the transmitted sequence  $\mathbf{x}(n)$  is formed by the L-symbol sequence from the maximum likelihood path in the trellis of Viterbi processor.



Figure 1. JSTE Chip

The Cholesky and the FBS processors use pipelining; while the first processor update the Cholesky factor, the second processor compute  $\mathbf{h}_{opt}(n)$  and  $\mathbf{w}_{opt}(n)$  from the previous update of the matrix  $\overline{\mathbf{L}}$ .

The Cholesky processor (figure 2) is based on cordic algorithm [7] that performs the Givens rotation [8] by using three cordic cells.

The  $\theta$ -cordic makes the first rotation in order to cancel the imaginary part of the input vector, the two  $\phi$ -cordic makes the second rotation that cancel the real part of the input vector and update a column of the Cholesky factor. The matrix  $\overline{\mathbf{L}}(n)$  is completely updated after M+L iterations of the input data vector.

The arithmetic representation is a fixed point with sixteen bit of precision, in order to avoid the overflow the binary word is extended up to twenty bit.



Figure 2. Cholesky processor.

The FBS is a multiprocessor system formed by two data path units, two AGU and a program controller. The BS unit makes the forward-backward substitution by means a multiplier/divider structure and an adder, while the MAC unit implement the multiply and accumulate operation. Both units use complex arithmetic and the arithmetic representation is a floating point with sixteen bit of mantissa and five bit of exponent. It was chosen floating point representation in order to handle wide dynamic range. The MAC unit of FBS processor is also employed to implement the matched filter (MF) and the spatial filtering. The MF output samples,  $\mathbf{z}(n)$ , are then processed according to the Viterbi algorithm, by means of another processor.

In the FBS unit (figure 3) is present an ISQRT module necessary for the normalization of **h**.

The FBS processor (figure 4) is composed by FBS unit and two AGU (Address Generation Unit), all this units can works in parallel. The control unit is a microcoded ROMbased controller and the instruction word is 50 bit wide. The Viterbi processor (figure 5) is designed to compute the Viterbi algorithm. It is composed by four executions units operating in parallel: data path, AGU, RAMs and program controller. The heart of Viterbi processor is the program controller that performs the program address generation and hardware jump subroutine. The application specific unit BUTTERFLY is tailored to compute a butterfly with only five clock cycles. The instruction word is 41 bit wide.



Figure 3. FBS unit.



Figure 4. FBS Processor.

The JSTE chip has been implemented on FPGA technology (Xilinx's VIRTEX XCV1000) that allows to integrate up to 1000K equivalent gates. The evaluated complexity of the complete JSTE chip is 400K ASIC gates running at frequency 70MHz.

In table 1 is reported the complexity of the single processors forming the JSTE chip. The complexity is

expressed in terms of clock cycles needed to update the Cholesky factor, compute  $\mathbf{w} \mathbf{h}$  and  $\mathbf{z}$ , demodulate a single bit.

In table 2 is reported the design methodology adopted to develop JSTE chip.



Figure 5. Viterbi Processor.

#### Table 1. JSTE Chip complexity.

|                          | Equivalent<br>gates | Clock<br>cycles | Frequency |
|--------------------------|---------------------|-----------------|-----------|
| Cholesky<br>proc. + RAMs | 230K                | 390             | 90 MHz    |
| FBS proc.                | 140K                | 540             | 70 MHz    |
| Viterbi proc.            | 30K                 | 60              | 70 MHz    |

#### Table 2. Design methodology.

| System simulation | Matlab, C language |  |
|-------------------|--------------------|--|
| VHDL simulator    | ModelSim           |  |
| Synthesis         | Synopsys           |  |
| Back-end          | Xilinx             |  |

#### 4. Performances

The complete SA-BTS (Smart Antenna-BTS) system based on software radio technology has been developed in ITALTEL in order to evaluate the smart antenna performances in realistic mobile radio scenario. The SA-BTS is interconnected to an  $8\times8$  elements planar array antenna. Each radiant element is a  $\lambda/2$  low profile patch in a printed circuit board technique. The same simulation scenario has been reproduced in laboratory by using the transmit side as a simulator of an RF scenario, this allows us to generate up to sixteen up-link GMSK signal according to the GSM recommendations. The meaningful simulation results are confirmed by the laboratory measurements. The BER values corresponding to these scenarios are shown in figure 6. The receivers considered in this simulation are: a single antenna MLSE system, a DOA based (WSF estimation + LCMV beamforming, SSDE) approach, and the JSTE receiver based on JSTE chip. As shown the JSTE solution provides a better performance almost on the whole range of carrier-to-interference ratios. In this test 34 dB improvement can be noted.



Figure 6. TU50 BER vs C/I for SSDE and JSTE receivers compared to single element antenna

## **5.** Conclusions

The space-time processing is known to be beneficial for the reduction of the CCI interference in GSM/DCS systems. The JSTE algorithm estimates jointly the beamforming and the channel response by the maximization of the SINR. A single chip implementation of the JSTE receiver based on three processors architecture has been presented. The efficient adaptive JSTE structure is based on systolic structure to update the Cholesky factor. The global complexity of the JSTE chip has been estimated as 400K gates. The actual FPGA capacity allow the implementation on a single chip. The elaboration time of a single GSM time slot has been evaluated as 1 ms. A real time elaboration can be realized running at frequency 130-150 MHz.

A four GSM carrier board based on JSTE chip is under development. This allows highly integrated GSM base station architecture implementation suitable for microcells.

#### References

- D. Giancola, F. Margherita, S. Parolari, A. Picciriello, U. Spagnolini, "Analysis of the spectral efficiency of a fullyadaptive antenna array system in GSM/DCS networks", Proc. VTC '99, Houston, 16-20 May 1999.
- [2] S. Anderson, M. Millnert, M. Viberg, B. Wahlberg, "An adaptive array for mobile communications system", IEEE Trans. Veh. Tech., vol 40 no.1, Feb. 1991.
- [3] A.J. Paulray, "Space-Time processing for wireless communication", Proc. ICASSP'97 Munich, pp.1-4, Apr. 1997.
- [4] D. Giancola, U. Girola, S. Parolari, A. Picciriello, U. Spagnolini, D. Vincenzoni "Space-Time processing for time varying co-channel interference rejection and channel estimation in GSM/DCS systems", Proc. VTC'99, Houston, 16-20 May 1999.
- [5] M.A. Lagunas, A.I. Perez-Neira, J. Vidal, "Optimum array combiner for sequence detectors", Proc. ICASSP'98 Seattle, May 1998.
- [6] J. Chun, T. Kailath, "Generalized displacement structure for block-Toeplitz, Toeplitz-block and Toeplitz-derived matrices", NATO ASI series vol. F 70, Springer-Verlag 1991.
- [7] J.E. Volder, "The CORDIC Trigonometric Computing Techniques", IRE Trans. Electron. Comput., EC-8(3), 1959.
- [8] C. M. Rader, "VLSI Systolic Arrays for Adaptive Nulling", IEEE Signal Processing Magazine, July 1996.